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Chapter 1 


Foreword 


1.1 What are Dynamic Games? 


Dynamic Games are mathematical models of the interaction between different agents 
who are controlling a dynamical system. Such situations occur in many instances like 
armed conflicts (e.g. duel between a bomber and a jet fighter), economic competition 
(e.g. investments in R&D for computer companies), parlor games (Chess, Bridge). 
These examples concern dynamical systems since the actions of the agents (also called 
players) influence the evolution over time of the state of a system (position and velocity 
of aircraft, capital of know-how for Hi-Tech firms, positions of remaining pieces on a 
chess board, etc). The difficulty in deciding what should be the behavior of these 
agents stems from the fact that each action an agent takes at a given time will influence 
the reaction of the opponent(s) at later time. These notes are intended to present the 
basic concepts and models which have been proposed in the burgeoning literature on 
game theory for a representation of these dynamic interactions. 


1.2 Origins of these Lecture Notes 


These notes are based on several courses on Dynamic Games taught by the authors, 
in different universities or summer schools, to a variety of students in engineering, 
economics and management science. The notes use also some documents prepared in 
cooperation with other authors, in particular B. Tolwinski [Tolwinski, 1988]. 


These notes are written for control engineers, economists or management scien- 
tists interested in the analysis of multi-agent optimization problems, with a particular 
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emphasis on the modeling of conflict situations. This means that the level of mathe- 
matics involved in the presentation will not go beyond what is expected to be known by 
a student specializing in control engineering, quantitative economics or management 
science. These notes are aimed at last-year undergraduate, first year graduate students. 


The Control engineers will certainly observe that we present dynamic games as an 
extension of optimal control whereas economists will see also that dynamic games are 
only a particular aspect of the classical theory of games which is considered to have 
been launched in [Von Neumann & Morgenstern 1944]. Economic models of imper- 
fect competition, presented as variations on the classic” Cournot model [Cournot, 1838], 
will serve recurrently as an illustration of the concepts introduced and of the theories 
developed. An interesting domain of application of dynamic games, which is described 
in these notes, relates to environmental management. The conflict situations occur- 
ring in fisheries exploitation by multiple agents or in policy coordination for achieving 
global environmental control (e.g. in the control of a possible global warming effect) 
are well captured in the realm of this theory. 


The objects studied in this book will be dynamic. The term dynamic comes from 
Greek dynasthai (which means to be able) and refers to phenomena which undergo 
a time-evolution. In these notes, most of the dynamic models will be discrete time. 
This implies that, for the mathematical description of the dynamics, difference (rather 
than differential) equations will be used. That, in turn, should make a great part of the 
notes accessible, and attractive, to students who have not done advanced mathematics. 
However, there will still be some developments involving a continuous time description 
of the dynamics and which have been written for readers with a stronger mathematical 
background. 


1.3 Motivation 


There is no doubt that a course on dynamic games suitable for both control engineer- 
ing students and economics or management science students requires a specialized 
textbook. 


Since we emphasize the detailed description of the dynamics of some specific sys- 
tems controlled by the players we have to present rather sophisticated mathematical 
notions, related to control theory. This presentation of the dynamics must be accom- 
panied by an introduction to the specific mathematical concepts of game theory. The 
originality of our approach is in the mixing of these two branches of applied mathe- 
matics. 


There are many good books on classical game theory. A nonexhaustive list in- 
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cludes [Owen, 1982], [Shubik, 1975a], [Shubik, 1975b], [Aumann, 1989], and more 
recently [Friedman 1986] and [Fudenberg & Tirole, 1991]. However, they do not in- 
troduce the reader to the most general dynamic games. [Basar & Olsder, 1982] does 
cover extensively the dynamic game paradigms, however, readers without a strong 
mathematical background will probably find that book difficult. This text is therefore 
a modest attempt to bridge the gap. 
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Part I 


Elements of Classical Game Theory 


Chapter 2 


Decision Analysis with Many Agents 


As we said in the introduction to these notes dynamic games constitute a subclass 
of the mathematical models studied in what is usually called the classical theory of 
game. It is therefore proper to start our exposition with those basic concepts of game 
theory which provide the fundamental tread of the theory of dynamic games. For 
an exhaustive treatment of most of the definitions of classical game theory see e.g. 
[Owen, 1982], [Shubik, 1975a], [Friedman 1986] and [Fudenberg & Tirole, 1991]. 


2.1 The Basic Concepts of Game Theory 


In a game we deal with the following concepts 


e Players. They will compete in the game. Notice that a player may be an indi- 
vidual, a set of individuals (or a team, a corporation, a political party, a nation, 
a pilot of an aircraft, a captain of a submarine, etc. . 


e A move or a decision will be a player’s action. Also, borrowing a term from 
control theory, a move will be realization of a player’s control or, simply, his 
control. 


e A player’s (pure) strategy will be a rule (or function) that associates a player’s 
move with the information available to him! at the time when he decides which 
move to choose. 





‘Political correctness promotes the usage of gender inclusive pronouns “they” and “their”. However, 
in games, we will frequently have to address an individual player’s action and distinguish it from a 
collective action taken by a set of several players. As far as we know, in English, this distinction is 
only possible through usage of the traditional grammar gender exclusive pronouns: possessive “his”, 
“her” and personal “he”, “she”. We find that the traditional grammar better suits your purpose (to avoid) 
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e A player’s mixed strategy is a probability measure on the player’s space of pure 
strategies. In other words, a mixed strategy consists of a random draw of a pure 
strategy. The player controls the probabilities in this random experiment. 


e A player’s behavioral strategy is a rule which defines a random draw of the ad- 
missible move as a function of the information available’. These strategies are 
intimately linked with mixed strategies and it has been proved early [Kuhn, 1953] 
that, for many games the two concepts coincide. 


e Payoffs are real numbers measuring desirability of the possible outcomes of the 
game, e.g. , the amounts of money the players may win (or loose). Other names 
of payoffs can be: rewards, performance indices or criteria, utility measures, 
etc. . 


The concepts we have introduced above are described in relatively imprecise terms. 
A more rigorous definition can be given if we set the theory in the realm of decision 
analysis where decision trees give a representation of the dependence of outcomes on 
actions and uncertainties . This will be called the extensive form of a game. 


2.2 Games in Extensive Form 


A game in extensive form is a graph (i.e. a set of nodes and a set of arcs) which has the 
structure of a tree? and which represents the possible sequence of actions and random 
perturbations which influence the outcome of a game played by a set of players. 


2.2.1 Description of moves, information and randomness 


A game in extensive form is described by a set of players, including one particular 
player called Nature , and a set of positions described as nodes on a tree structure. At 
each node one particular player has the right to move, i.e. he has to select a possible 
action in an admissible set represented by the arcs emanating from the node. 


The information at the disposal of each player at the nodes where he has to select 
an action is described by the information structure of the game . In general the player 





confusion and we will refer in this book to a singular genderless agent as “he” and the agent’s possession 
as “his”. 

>A similar concept has been introduced in control theory under the name of relaxed controls. 

3A tree is a graph where all nodes are connected but there are no cycles. In a tree there is a single 
node without ”parent”, called the ’root” and a set of nodes without descendants, the leaves”. There is 
always a single path from the root to any leaf. 
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may not know exactly at which node of the tree structure the game is currently located. 
His information has the following form: 


he knows that the current position of the game is an element in a given 
subset of nodes. He does not know which specific one it is. 


When the player selects a move, this correponds to selecting an arc of the graph which 
defines a transition to a new node, where another player has to select his move, etc. 
Among the players, Nature is playing randomly, i.e. Nature’s moves are selected at 
random. The game has a stopping rule described by terminal nodes of the tree. Then 
the players are paid their rewards, also called payoffs . 


Figure 2.1 shows the extensive form of a two-player, one-stage stochastic game 
with simultaneous moves. We also say that this game has the simultaneous move in- 
formation structure . It corresponds to a situation where Player 2 does not know which 
action has been selected by Player 1 and vice versa. In this figure the node marked D, 
corresponds to the move of player 1, the nodes marked D> correspond to the move of 
Player 2. 


The information of the second player is represented by the oval box. Therefore 
Player 2 does not know what has been the action chosen by Player 1. The nodes 
marked E correspond to Nature’s move. In that particular case we assume that three 
possible elementary events are equiprobable. The nodes represented by dark circles 
are the terminal nodes where the game stops and the payoffs are collected. 


This representation of games is obviously inspired from parlor games like Chess , 
Poker, Bridge, etc which can be, at least theoretically, correctly described in this 
framework. In such a context, the randomness of Nature ’s play is the representation 
of card or dice draws realized in the course of the game. 


The extensive form provides indeed a very detailed description of the game. It is 
however rather non practical because the size of the tree becomes very quickly, even 
for simple games, absolutely huge. An attempt to provide a complete description of a 
complex game like Bridge , using an extensive form, would lead to a combinatorial ex- 
plosion. Another drawback of the extensive form description is that the states (nodes) 
and actions (arcs) are essentially finite or enumerable. In many models we want to deal 
with, actions and states will also often be continuous variables. For such models, we 
will need a different method of problem description. 


Nevertheless extensive form is useful in many ways. In particular it provides the 
fundamental illustration of the dynamic structure of a game. The ordering of the se- 
quence of moves, highlighted by extensive form, is present in most games. Dynamic 
games theory is also about sequencing of actions and reactions. Here, however, dif- 
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Figure 2.1: A game in extensive form 


ferent mathematical tools are used for the representation of the game dynamics. In 
particular, differential and/or difference equations are utilized for this purpose. 


2.2.2 Comparing Random Perspectives 


Due to Nature’s randomness, the players will have to compare and choose among 
different random perspectives in their decision making. The fundamental decision 
structure is described in Figure 2.2. If the player chooses action a; he faces a random 
perspective of expected value 100. If he chooses action az he faces a sure gain of 100. 
If the player is risk neutral he will be indifferent between the two actions. If he is risk 
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averse he will choose action az, if he is risk lover he will choose action a,. In order to 


100 





200 





100 


Figure 2.2: Decision in uncertainty 


represent the attitude toward risk of a decision maker Von Neumann and Morgenstern 
introduced the concept of cardinal utility [Von Neumann & Morgenstern 1944]. If one 
accepts the axioms of utility theory then a rational player should take the action which 
leads toward the random perspective with the highest expected utility . 


This solves the problem of comparing random perspectives. However this also 
introduces a new way to play the game. A player can set a random experiment in order 
to generate his decision. Since he uses utility functions the principle of maximization 
of expected utility permits him to compare deterministic action choices with random 
ones. 


As a final reminder of the foundations of utility theory let’s recall that the Von Neumann- 
Morgenstern utility function is defined up to an affine transformation. This says that 
the player choices will not be affected if the utilities are modified through an affine 
transformation. 
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2.3 Additional concepts about information 


What is known by the players who interact in a game is of paramount importance. We 
refer briefly to the concepts of complete and perfect information. 


2.3.1 Complete and perfect information 


The information structure of a game indicates what is known by each player at the time 
the game starts and at each of his moves. 


Complete vs Incomplete Information 


Let us consider first the information available to the players when they enter a game 
play. A player has complete information if he knows 

e who the players are 

e the set of actions available to all players 

e all possible outcomes to all players. 
A game with complete information and common knowledge is a game where all play- 


ers have complete information and all players know that the other players have com- 
plete information. 


Perfect vs Imperfect Information 


We consider now the information available to a player when he decides about specific 
move. In a game defined in its extensive form, if each information set consists of just 
one node, then we say that the players have perfect information . If that is not the case 
the game is one of imperfect information . 


Example 2.3.1 A game with simultaneous moves, as e.g. the one shown in Figure 2.1, 
is of imperfect information. 
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Perfect recall 


If the information structure is such that a player can always remember all past moves 
he has selected, and the information he has received, then the game is one of perfect 
recall. Otherwise it is one of imperfect recall. 


2.3.2 Commitment 


A commitment is an action taken by a player that is binding on him and that is known 
to the other players. In making a commitment a player can persuade the other players 
to take actions that are favorable to him. To be effective commitments have to be 
credible. A particular class of commitments are threats . 


2.3.3 Binding agreement 


Binding agreements are restrictions on the possible actions decided by two or more 
players, with a binding contract that forces the implementation of the agreement. Usu- 
ally, to be binding an agreement requires an outside authority that can monitor the 
agreement at no cost and impose on violators sanctions so severe that cheating is pre- 
vented. 


2.4 Games in Normal Form 


2.4.1 Playing games through strategies 


Let M = {1,...,m)} be the set of players. A pure strategy y; for Player j is a mapping 
which transforms the information available to Player j at a decision node where he is 
making a move into his set of admissible actions. We call strategy vector the m-tuple 
y = (q)j=1,...m. Once a strategy is selected by each player, the strategy vector y is 
defined and the game is played as it were controlled by an automaton‘. 


An outcome (expressed in terms of expected utility to each player if the game 
includes chance nodes) is associated with a strategy vector y. We denote by I’; the set 





“This idea of playing games through the use of automata will be discussed in more details when we 
present the folk theorem for repeated games in Part II 
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of strategies for Player 7. Then the game can be represented by the m mappings 
V:i eee De -Im >R, jEM 


that associate a unique (expected utility) outcome V;(7) for each player j € M with 
a given strategy vector in y € Ty x ---I’; x ---T,. One then says that the game is 
defined in its normal form. 


2.4.2 From the extensive form to the strategic or normal form 


We consider a simple two-player game, called “matching pennies”. The rules of the 
game are as follows: 


The game is played over two stages. At first stage each player chooses 
head (H) or tail (T) without knowing the other player’s choice. Then they 
reveal their choices to one another. If the coins do not match, Player 1 
wins $5 and Payer 2 wins -$5. If the coins match, Player 2 wins $5 and 
Payer | wins -$5. At the second stage, the player who lost at stage 1 has 
the choice of either stopping the game or playing another penny matching 
with the same type of payoffs as in the first stage (Q, H, T). 


The extensive form tree 


This game is represented in its extensive form in Figure 2.3. The terminal payoffs rep- 
resent what Player 1 wins; Player 2 receives the opposite values. We have represented 
the information structure in a slightly different way here. A dotted line connects the 
different nodes forming an information set for a player. The player who has the move 
is indicated on top of the graph. 


Listing all strategies 


In Table 2.1 we have identified the 12 different strategies that can be used by each of 
the two players in the game of Matching pennies. Each player moves twice. In the 
first move the players have no information; in the second move they know what have 
been the choices made at first stage. We can easily identify the whole set of possible 
strategies. 
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Figure 2.3: The extensive form tree of the matching pennies game 











P1 P2 P1 P2 P1 ‘ 
H 10 
O 
T 0 
H H H 0 
H T T 10 
-5 
T Q 
H 0 
-10 
T s 10 
T 0 
Q -5 
, 0 
H 7 i 
T T -10 
Q T 0 
~ 5 
T ~- 10 
H 
T 
T 
H 
T™ 10 
Payoff matrix 


In Table 2.2 we have represented the payoffs that are obtained by Player 1 when both 
players choose one of the 12 possible strategies. 
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Strategies of Player 1 | Strategies of Player 2 

Ist scnd move Ist scnd move 

move | ifplayer2 | move | if player 1 

has played | move | has played 
H T H T 
1 H IQ H| H |H Q 
2 H IQ T| H IT Q 
3 H |H H| H |H H 
4 H |H T| H IT H 
5 H JIT H| H |H T 
6 H JIT T| H IT T 
7 T |H QI T IQ H 
8 T T QI T IQ T 
9 T |H H| T |H H 
10; T T H| T |H T 
11 T IH T| T IT H 
12 T T T| T IT T 























Table 2.1: List of strategies 


2.4.3 Mixed and Behavior Strategies 


Mixing strategies 


Since a player evaluates outcomes according to his VNM-utility functions he can envi- 
sion to “mix” strategies by selecting one of them randomly, according to a lottery that 
he will define. This introduces one supplementary chance move in the game descrip- 
tion. 


For example, if Player j has p pure strategies 7;,,k = 1,...,p he can select the 
strategy he will play through a lottery which gives a probability x ;;, to the pure strategy 
Yin k = 1,...,p. Now the possible choices of action by Player j are elements of the 
set of all the probability distributions 


We note that the set X; is compact and convex in R?. 
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1 Zales 4| 5 6| 7 8 9| 10}; 11} 12 

1} -5} -5| -5} -5| -5| -5 | -5] -5 O 0| 10; 10 

2| -5| -5| -5| -5| -5| -5 5 5/ 10| 10; Of O 

3 O|-10| 0J-10|-10] Of 5 5 0} 0| 10; 10 

4|-10| 0O/-10} 0J-I0| 0| 5 5/ 10| 10| Of 0 

5 0|-10} O/;-10} Oj] -10} 5 5 0} 0| 10; 10 

6| 0-10) O/;-10}; 0]-10| 5 5/ 10| 10; Of O 

7 5 5 0; O; 10| 10| -5| -5| -5| -5| -5 | -5 

8 5 5| 10| 10; 0f OF -5| -5| -5 5 5 5 

9 5 5 0; Of 10| 10|-10| 0J-I0| O/;-10; 0 

10) 5 5| 10| 10; Of Oj} -10} 0]|-10] Oj; -10;} O 
11 5 5 0; Of 10| 10| 0J-I0| 0Jf-10) 0-10 
12 5 5| 10| 10; Of OJ O/;-10} Oj} -10} O0J-10 















































Table 2.2: Payoff matrix 


Behavior strategies 


A behavior strategy is defined as a mapping which associates with the information 
available to Player 7 at a decision node where he is making a move a probability dis- 
tribution over his set of actions. 


The difference between mixed and behavior strategies is subtle. In a mixed strat- 
egy, the player considers the set of possible strategies and picks one, at random, ac- 
cording to a carefully designed lottery. In a behavior strategy the player designs a 
strategy that consists in deciding at each decision node, according to a carefully de- 
signed lottery, this design being contingent to the information available at this node. 
In summary we can say that a behavior strategy is a strategy that includes randomness 
at each decision node. A famous theorem [Kuhn, 1953], that we give without proof, 
establishes that these two ways of introding randomness in the choice of actions are 
equivalent in a large class of games. 


Theorem 2.4.1 In an extensive game of perfect recall all mixed strategies can be rep- 
resented as behavior strategies. 
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Chapter 3 


Solution concepts for noncooperative 
games 


3.1 introduction 


In this chapter we consider games described in their normal form and we propose 
different solution concept under the assumption that the players are non cooperating. 
In noncooperative games the players do not communicate between each other, they 
don’t enter into negotiation for achieving a common course of action. They know that 
they are playing a game. They know how the actions, their own and the actions of 
the other players, will determine the payoffs of every player. However they will not 
cooperate. 


To speak of a solution concept for a game one needs, first of all, to describe the 
game in its normal form. The solution of an m-player game will thus be a set of strategy 
vectors y that have attractive properties expressed in terms of the payoffs received by 
the players. 


Recall that an m-person game in normal form is defined by the following data 
{M, (T;), (V;) for j = My, 


where M is the set of players, M = {1, 2, ..., m}, and for each player j € M, T; is the 
set of strategies (also called the strategy space ) and V;, j € M, is the payoff function 
that assigns a real number V; (y) with a strategy vector y E€ Ta x Tg X... X Tm. 


In this chapter we shall study different classes of games in normal form. The 
first category consists in the so-called matrix games describing a situation where two 


players are in a complete antagonistic situation since what a player gains the other 


27 


28 CHAPTER 3. SOLUTION CONCEPTS FOR NONCOOPERATIVE GAMES 


player looses, and where each player has a finite choice of strategies. Matrix games 
are also called two player zero-sum finite games. The second category will consist 
of two player games, again with a finite strategy set for each player, but where the 
payoffs are not zero-sum. These are the nonzero-sum matrix games or bimatrix games . 
The third category, will be the so-called concave games that encompass the previous 
classes of matrix and bimatrix games and for which we will be able to prove nice 
existence, uniqueness and stability results for a noncooperative game solution concept 
called equilibrium . 


3.2 Matrix Games 


Definition 3.2.1 A game is zero-sum if the sum of the players’ payoffs is always zero. 
Otherwise the game is nonzero-sum. A two-player zero-sum game is also called a duel. 


Definition 3.2.2 A two-player zero-sum game in which each player has only a finite 
number of actions to choose from is called a matrix game. 


Let us explore how matrix games can be “solved”. We number the players 1 and 2 
respectively. Conventionally, Player 1 is the maximizer and has m (pure) strategies, 
say 7 = 1,2,...,m, and Player 2 is the minimizer and has n strategies to choose from, 
say j = 1,2,...,n. If Player 1 chooses strategy i while Player 2 picks strategy 7, then 
Player 2 pays Player 1 the amount a;;'. The set of all possible payoffs that Player 
1 can obtain is represented in the form of the m x n matrix A with entries a;; for 
i = 1,2,...,m and j = 1,2,...,n. Now, the element in the 7—th row and 7—th column 
of the matrix A corresponds to the amount that Player 2 will pay Player 1 if the latter 
chooses strategy 7 and the former chooses strategy j. Thus one can say that in the 
game under consideration, Player 1 (the maximizer) selects rows of A while Player 
2 (the minimizer) selects columns of that matrix, and as the result of the play, Player 
2 pays Player 1 the amount of money specified by the element of the matrix in the 
selected row and column. 


Example 3.2.1 Consider a game defined by the following matrix: 
3 1 8 
4 10 0 


'Negative payments are allowed. We could have said also that Player 1 receives the amount aij and 
Player 2 receives the amount —a;;. 
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The question now is what can be considered as players’ best strategies. 


One possibility is to consider the players’ security levels. It is easy to see that if 
Player 1 chooses the first row, then, whatever Player 2 does, Player 1 will get a payoff 
equal to at least 1 (util). By choosing the second row, on the other hand, Player 1 risks 
getting 0. Similarly, by choosing the first column Player 2 ensures that he will not have 
to pay more than 4, while the choice of the second or third column may cost him 10 
or 8, respectively. Thus we say that Player 1’s security level is 1 which is ensured by 
the choice of the first row, while Player 2’s security level is 4 and it is ensured by the 
choice of the first column. Notice that 


1 = max min dij 
i j 


and 


4 = min max aj; 
J (3 


which is the reason why the strategy which ensures that Player 1 will get at least the 
payoff equal to his security level is called his maximin strategy. Symmetrically, the 
strategy which ensures that Player 2 will not have to pay more than his security level 
is called his minimax strategy . 


Lemma 3.2.1 The following inequality holds 


max min a,; < MIN MAX aj;. (3.1) 
i i j i 


Proof: The proof of this result is based on the remark that, since both security levels 
are achievable, they necessarily satisfy the inequality (3.1). We give below a more 
detailed proof. 


We note the obvious set of inequalities 


min drj < Ak S MAX Gi (3.2) 


which holds for all possible k and l. More formally, let (i*, 7*) and (ix, ją) be defined 
by 


Qixj* = Max min Qij (3.3) 
ij 
and 
i,j, = Min Max aj; (3.4) 
J a 





7A util is the utility unit. 
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respectively. Now consider the payoff a;-;,. Then, by (3.2) with k = 7* and l = ją we 
get 


max(min a;;) < airj, < min(max dij). 
i j J k 


QED. 


An important observation is that if Player 1 has to move first and Player 2 acts 
having seen the move made by Player 1, then the maximin strategy is Player 1’s best 
choice which leads to the payoff equal to 1. If the situation is reversed and it is Player 
2 who moves first, then his best choice will be the minimax strategy and he will have 
to pay 4. Now the question is what happens if the players move simultaneously. The 
careful study of the example shows that when the players move simultaneously the 
minimax and maximin strategies are not satisfactory “solutions” to this game. Notice 
that the players may try to improve their payoffs by anticipating each other’s strategy. 
In the result of that we will see a process which in this case will not converge to any 
solution. 


Consider now another example. 


Example 3.2.2 Let the matrix game A be given as follows 


10 —15* 20 
20 —30 40 
30 —45 60 


Can we find satisfactory strategy pairs? 


It is easy to see that 


max min a;; = max{—15, —30, —45} = —15 
5 


and 


min max a;; = min{30, —15, 60} = —15 
j i 


and that the pair of maximin and minimax strategies is given by 


(i, j) = (1,2). 


That means that Player 1 should choose the first row while Player 2 should select the 
second column, which will lead to the payoff equal to -15.9 


In the above example, we can see that the players’ maximin and minimax strategies 
“solve” the game in the sense that the players will be best off if they use these strategies. 
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3.2.1 Saddle-Points 


Let us explore in more depth this class of strategies that solve the zero-sum matrix 
game. 


Definition 3.2.3 [fin a matrix game A = [aj;\i=1,....m:j=1,....n; there exists a pair (i*, j*) 
such that, for all il,...,m and ĵj1,... n 





Qijt S Ainge S Airy 3:5) 


we say that the pair (i*, j*) is a saddle point in pure strategies for the matrix game. 


As an immediate consequence, we see that, at a saddle point of a zero-sum game, the 
security levels of the two players are equal, i.e. , 


max min aj; = Min Max ay; = Qir j*. 
i j j i 


What is less obvious is the fact that, if the security levels are equal then there exists a 
saddle point. 


Lemma 3.2.2 If, in a matrix game, the following holds 
max min aj; = Min max a;; = V 
i j j i 


then the game admits a saddle point in pure strategies. 


Proof: Let i* and 7* be a strategy pair that yields the security level payoffs v (resp. 
—v) for Player 1 (resp. Player 2). We thus have for all ¿ = 1,...,m and j =1,...,n 


airj > min aje; = max Min dij (3.6) 
diye < Max dijs = min max dij. (3.7) 


Since 


max min a;j = Min MAX aj; = Ajj» = V 
a j j a 


by (3.6)-(3.7) we obtain 


lijs S gage S Qirj 


which is the saddle point condition. QED. 


Saddle point strategies provide a solution to the game problem even if the players 
move simultaneously. Indeed, in Example 3.2.2, if Player 1 expects Player 2 to choose 
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the second column, then the first row will be his optimal choice. On the other hand, 
if Player 2 expects Player 1 to choose the first row, then it will be optimal for him 
to choose the second column. In other words, neither player can gain anything by 
unilaterally deviating from his saddle point strategy. Each strategy constitutes the best 
reply the player can have to the strategy choice of his opponent. This observation leads 
to the following definition. 


Remark 3.2.1 Using strategies 1* and j*, Players 1 and 2 cannot improve their payoff 
by unilaterally deviating from (i)* or ((j)*) respectively. We call such strategies an 
equilibrium. 


Saddle point strategies, as shown in Example 3.2.2, lead to both an equilibrium and 
a pair of guaranteed payoffs. Therefore such a strategy pair, if it exists, provides a 
solution to a matrix game, which is “good” in that rational players are likely to adopt 
this strategy pair. 


3.2.2 Mixed strategies 


We have already indicated in chapter 2 that a player could “mix” his strategies by 
resorting to a lottery to decide what to play. A reason to introduce mixed strategies in 
a matrix game is to enlarge the set of possible choices. We have noticed that, like in 
Example 3.2.1, many matrix games do not possess saddle points in the class of pure 
strategies. However Von Neumann proved that saddle point strategy pairs exist in the 
class of mixed strategies. Consider the matrix game defined by an m x n matrix A. 
(As before, Player 1 has m strategies, Player 2 has n strategies). A mixed strategy for 
Player 1 is an m-tuple 
(Vig Pope) 


where x; are nonnegative for i = 1, 2, ...,m, and z1 + %2+...+%m = 1. Similarly, a 
mixed strategy for Player 2 is an n-tuple 


Y = (Y1, Y2, Yn) 


where y; are nonnegative for j = 1,2,...,n, and y1 + yo +... t Ym = 1. 


Note that a pure strategy can be considered as a particular mixed strategy with 
one coordinate equal to one and all others equal to zero. The set of possible mixed 
strategies of Player 1 constitutes a simplex in the space R™. This is illustrated in 
Figure 3.1 for m = 3. Similarly the set of mixed strategies of Player 2 is a simplex in 
R”. A simplex is, by construction, the smallest closed convex set that contains n + 1 
points in R”. 
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T3 








Figure 3.1: The simplex of mixed strategies 


The interpretation of a mixed strategy, say x, is that Player 1, chooses his pure 
strategy 2 with probability x;, 7 = 1,2,...,m. Since the two lotteries defining the 
random draws are independent events, the joint probability that the strategy pair (i, 7) 
be selected is given by x; yj. Therefore, with each pair of mixed strategies (x, y) we 
can associate an expected payoff given by the quadratic expression (in x. y)? 


T 
1 


m n 
i=l j= 


The first important result of game theory proved in [Von Neumann 1928] showed 
the following 


Theorem 3.2.1 Any matrix game has a saddle point in the class of mixed strategies, 
i.e. , there exist probability vectors x and y such that 


max min x7 Ay = min max a! Ay = («*)" Ay* = v* 
x y y x 


where v* is called the value of the game. 





3The superscript T denotes the transposition operator on a matrix. 
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We shall not repeat the complex proof given by von Neumann. Instead we shall relate 
the search for saddle points with the solution of linear programs. A well known duality 
property will give us the saddle point existence result. 


3.2.3 Algorithms for the Computation of Saddle-Points 


Matrix games can be solved as linear programs. It is easy to show that the following 
two relations hold: 


m 
v* = max min x? Ay = max min $` Tillij (3.8) 
and 
n 
* A T : 
z* = min max xr“ Ay = min max Qi; 3.9 
y 2 y y ; 29 ij ( ) 


These two relations imply that the value of the matrix game can be obtained by solving 
any of the following two linear programs: 


1. Primal problem 


max v 
subject to 
m 
S XC taij, PEL 2 gT 


i=1 
m 
i=1 


0, i=1,2,..m 


IV 


Ti 
2. Dual problem 


min 


x 


subject to 


Z 


IV 


n 
X yids, 7=1,2,...,m 
j=l 


I 
Me 


Yj 


II 
mn 


co: 


Yj 2 S12. 


, 


The following theorem relates the two programs together. 


Theorem 3.2.2 (Von Neumann [Von Neumann 1928]): Any finite two-person zero- 
sum matrix game A has a value 
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proof: The value v* of the zero-sum matrix game A is obtained as the common op- 
timal value of the following pair of dual linear programming problems. The respective 
optimal programs define the saddle-point mixed strategies. 


Primal Dual 
max Vv min Zz 
subject to x’A> v1" | subjectto Ay < z1 
ol = 1Ty=1 
xr>0 y= 0 
where 
1 
1=/1 
1 


denotes a vector of appropriate dimension with all components equal to 1. One needs 
to solve only one of the programs. The primal and dual solutions give a pair of saddle 
point strategies. e 


Remark 3.2.2 Simple n x n games can be solved more easily (see [Owen, 1982]]). 
Suppose Ais an n x n matrix game which does not have a saddle point in pure strate- 
gies. The players’ unique saddle point mixed strategies and the game value are given 
by: 


17 A? 
API 

Pee (68.11) 
detA 


where AP is the adjoint matrix of A, det A the determinant of A, and 1 the vector of 
ones as before. 


Let us illustrate the usefulness of the above formulae on the following example. 


Example 3.2.3 We want to solve the matrix game 


Ei 
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The game, obviously, has no saddle point (in pure strategies). The adjoint A” is 

2 0 

1 1 
and 1A? = [3 1], A?17 = [2 2], 1A?17 = 4, det A = 2. Hence the best mixed 
strategies for the players are: 


3 1 1 
t= re A y= l5; a! 
and the value of the play is: 
1 
uS 


In other words in the long run Player 1 is supposed to win .5 if he uses the first row 
75% of times and the second 25% of times. The Player 2’s best strategy will be to use 
the first and the second column 50% of times which ensures him a loss of .5 (only; 
using other strategies he is supposed to lose more). 


3.3 Bimatrix Games 


We shall now extend the theory to the case of a nonzero sum game. A bimatrix game 
conveniently represents a two-person nonzero sum game where each player has a finite 
set of possible pure strategies. 


In a bimatrix game there are two players, say Player 1 and Player 2 who have m and 
n pure strategies to choose from respectively. Now, if the players select a pair of pure 
strategies, say (7, 7), then Player 1 obtains the payoff a;; and Player 2 obtains b;;, where 
a,j and b;; are some given numbers. The payoffs for the two players corresponding to 
all possible combinations of pure strategies can be represented by two m x n payoff 
matrices A and B with entries a,; and b;; respectively (from here the name). 


Notice that a (zero-sum) matrix game is a bimatrix game where b;; = —a,;;. When 
aij + bi; = 0, the game is a zero-sum matrix game. Otherwise, the game is nonzero- 
sum. (As a;; and b;; are the players’ payoff this conclusion agrees with Definition 
3.2.1.) 


Example 3.3.1 Consider the bimatrix game defined by the following matrices 


52 44 44 
42 46 39 


and 


50 44 41 
42 49 43| 


3.3. BIMATRIX GAMES 37 


It is often convenient to combine the data contained in two matrices and write it in the 
form of one matrix whose entries are ordered pairs (a;;, b;;). In this case one obtains 


(52,50)* (44,44) (44,41) 
(42,42) (46,49)* (39,43) |” 


In the above bimatrix some cells have been indicated with asterisks *. They correspond 
to outcomes resulting from equilibra , a concept that we shall introduce now. 


If Player 1 chooses strategy “row 1”, the best reply by Player 2 is to choose strategy 
“column 1”, and vice versa. Therefore we say that the outcome (52, 50) is associated 
with the equilibrium pair “row 1,column 1”. 


The situation is the same for the outcome (46, 49) that is associated with another 
equilibrium pair “row 2,column 2”. 


We already see on this simple example that a bimatrix game can have many equi- 
libria. However there are other examples where no equilibrium can be found in pure 
strategies. So, as we have already done with (zero-sum) matrix games, let us expand 
the strategy sets to include mixed strategies. 


3.3.1 Nash Equilibria 


Assume that the players may use mixed strategies . For zero-sum matrix games we have 
shown the existence of a saddle point in mixed strategies which exhibits equilibrium 
properties. We formulate now the Nash equilibrium concept for the bimatrix games. 
The same concept will be defined later on for a more general m-player case. 


Definition 3.3.1 A pair of mixed strategies (x*,y*) is said to be a Nash equilibrium of 
the bimatrix game if 


1. (a*)? A(y*) > £T A(y*) for every mixed strategy x, and 


2. (a*)? B(y*) > (a*)" By for every mixed strategy y. 


In an equilibrium, no player can improve his payoff by deviating unilaterally from his 
equilibrium strategy. 
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Remark 3.3.1 A Nash equilibrium extends to nonzero sum games the equilibrium 
property that was observed for the saddle point solution to a zero sum matrix game. 
The big difference with the saddle point concept is that, in a nonzero sum context, the 
equilibrium strategy for a player does not guarantee him that he will receive at least the 
equilibrium payoff. Indeed if his opponent does not play “well”, i.e. does not use the 
equilibrium strategy, the outcome of a player can be anything; there is no guarantee. 


Another important step in the development of the theory of games has been the follow- 
ing theorem [Nash, 1951] 


Theorem 3.3.1 Every finite bimatrix game has at least one Nash equilibrium in mixed 
strategies. 


Proof: A more general existence proof including the case of bimatrix games will be 
given in the next section.e 


3.3.2 Shortcommings of the Nash equilibrium concept 
Multiple equilibria 


As noticed in Example 3.3.1 a bimatrix game may have several equilibria in pure strate- 
gies. There may be additional equilibria in mixed strategies as well. The nonunique- 
ness of Nash equilibria for bimatrix games is a serious theoretical and practical prob- 
lem. In Example 3.3.1 one equilibrium strictly dominates the other, ie., gives both 
players higher payoffs. Thus, it can be argued that even without any consultations the 
players will naturally pick the strategy pair (i, 7) = (1,1). 


It is easy to define examples where the situation is not so clear. 


Example 3.3.2 Consider the following bimatrix game 


Ron ve 
(0,0) (1,2)* 


It is easy to see that this game has two equilibria (in pure strategies), none of which 
dominates the other. Moreover, Player 1 will obviously prefer the solution (1,1), while 
Player 2 will rather have (2,2). It is difficult to decide how this game should be played 
if the players are to arrive at their decisions independently of one another. 
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The prisoner’s dilemma 


There is a famous example of a bimatrix game, that is used in many contexts to argue 
that the Nash equilibrium solution is not always a “good” solution to a noncooperative 
game. 


Example 3.3.3 Suppose that two suspects are held on the suspicion of committing 
a serious crime. Each of them can be convicted only if the other provides evidence 
against him, otherwise he will be convicted as guilty of a lesser charge. By agreeing 
to give evidence against the other guy, a suspect can shorten his sentence by half. 
Of course, the prisoners are held in separate cells and cannot communicate with each 
other. The situation is as described in Table 3.1 with the entries giving the length of the 
prison sentence for each suspect, in every possible situation. In this case, the players 
are assumed to minimize rather than maximize the outcome of the play. The unique 














Suspect I: Suspect II: refuses | agrees to testify 
refuses (2,2) (10,1) 
agrees to testify (1,10) (5, 5)* 























Table 3.1: The Prisoner’s Dilemma. 


Nash equilibrium of this game is given by the pair of pure strategies (agree — to — 
testify, agree — to — testify) with the outcome that both suspects will spend five 
years in prison. This outcome is strictly dominated by the strategy pair (re fuse — to — 
testify, refuse — to — testify), which however is not an equilibrium and thus is not 
a realistic solution of the problem when the players cannot make binding agreements. 


This example shows that Nash equilibria could result in outcomes being very far from 
efficiency. 


3.3.3 Algorithms for the Computation of Nash Equilibria in Bima- 
trix Games 


Linear programming is closely associated with the characterization and computation 
of saddle points in matrix games. For bimatrix games one has to rely to algorithms 
solving either quadratic programming or complementarity problems. There are also a 
few algorithms (see [Aumann, 1989], [Owen, 1982]) which permit us to find an equi- 
librium of simple bimatrix games. We will show one for a 2 x 2 bimatrix game and 
then introduce the quadratic programming [Mangasarian and Stone, 1964] and com- 
plementarity problem [Lemke & Howson 1964] formulations. 
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Equilibrium computation in a 2 x 2 bimatrix game 


For a simple 2 x 2 bimatrix game one can easily find a mixed strategy equilibrium as 
shown in the following example. 


Example 3.3.4 Consider the game with payoff matrix given below. 


Notice that this game has no pure strategy equilibrium. Let us find a mixed strategy 
equilibrium. 


Assume Player 2 chooses his equilibrium strategy y (i.e. 100y% of times use 


first column, 100(1 — y)% times use second column) in such a way that Player 1 (in 
equilibrium) will get as much payoff using first row as using second row i.e. 


1 
y+ O(L—y) = sy + 1(1—y). 
This is true for y* = 2, 
Symmetrically, assume Player 1 is using a strategy x (i.e. 100x% of times use first 


row, 100(1 — y)% times use second row) such that Player 2 will get as much payoff 
using first column as using second column i.e. 


1 
Ox + 3(1— 2) = 1s + 0(1 — 2). 


) and (4). 


This is true for x* = E The players’ payoffs will be, respectively, (2 


Then the pair of mixed strategies 
(x”, 1— gr); (y*, De y“) 


is an equilibrium in mixed strategies. 


Links between quadratic programming and Nash equilibria in bimatrix games 


Mangasarian and Stone (1964) have proved the following result that links quadratic 
programming with the search of equilibria in bimatrix games. Consider a bimatrix 
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game (A, B). We associate with it the quadratic program 


max lx" Ay + 27 By — v' — v7] (3.13) 
s.t. 
Ay < vtlm (3.14) 
B'r < 41, (3.15) 
x,y 2 0 (3.16) 
LTlm 1 (3.17) 
y ln = 1 (3.18) 
viv’, E€ R. (3.19) 


Lemma 3.3.1 The following two assertions are equivalent 


(i) (x,y, v', v?) is a solution to the quadratic programming problem (3.13)- (3.19) 


(ii) (x,y) is an equilibrium for the bimatrix game. 


Proof: From the constraints it follows that x7 Ay < v! and x? By < v? for any 
feasible (x,y, vt, v?). Hence the maximum of the program is at most 0. Assume that 
(x, y) is an equilibrium for the bimatrix game. Then the quadruple 


is feasible, i.e. satisfies (7.2)-(3.19), and gives to the objective function (7.1) a value 
0. Hence the equilibrium defines a solution to the quadratic programming problem 
(7.1)-(3.19). 

Conversely, let (£x, Yx, V}, v2) be a solution to the quadratic programming problem 
(7.1)- (3.19). We know that an equilibrium exists for a bimatrix game (Nash theorem). 
We know that this equilibrium is a solution to the quadratic programming problem 
(7.1)-(3.19) with optimal value 0. Hence the optimal program (£4, Yx, v1, v2) must also 
give a value 0 to the objective function and thus be such that 


xrl Ay, +T By, = vitv?. (3.20) 


For any x > 0 and y > 0 such that 271,, = 1 and y71,, = 1 we have, by (3.17) and 
(3.18) 


x? Ay, < i 
Dy SU, 
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In particular we must have 


x, Ays < R 

x, Bys < F 
These two conditions with (3.20) imply 

rl Ay, = vi 

rT By, = v. 


Therefore we can conclude that For any x > 0 and y > 0 such that x? 1,, = 1 and 
y? 1, = 1 we have, by (3.17) and (3.18) 


az’ Ay, < 
ai By < 


and hence, (x,, Yx) is a Nash equilibrium for the bimatrix game. QED 


A Complementarity Problem Formulation 


We have seen that the search for equilibria could be done through solving a quadratic 
programming problem. Here we show that the solution of a bimatrix game can also be 
obtained as the solution of a complementarity problem . 


There is no loss in generality if we assume that the payoff matrices are m x n and 
have only positive entries (A > 0 and B > 0). This is not restrictive, since VNM 
utilities are defined up to an increasing affine transformation. A strategy for Player 1 
is defined as a vector x € R” that satisfies 


r > 0 (3.21) 
co ae (3.22) 

and similarly for Player 2 
y = 0 (3.23) 


It is easily shown that the pair (2*, y*) satisfying (3.21)-(3.24) is an equilibrium iff 


(xt A yY*)Lm 
(z* B y*“)ln 


Ay* (A > 0) 


BTz* (B>0) oo 


2 
2z 


i.e. if the equilibrium condition is satisfied for pure strategy alternatives only. 
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Consider the following set of constraints with vı € R and v2 E€ R 


z“ (Ay* — ulm) = 0 


3.26 
y* (BT a* — valn) ene) 


and 


The relations on the right are called complementarity constraints . For mixed strategies 
(x*,y*) satisfying (3.21)-(3.24), they simplify to x*° Ay* = v1, «* By* = v2. This 
shows that the above system (3.26) of constraints is equivalent to the system (3.25). 


Define sı = x/v2, 82 = y/v, and introduce the slack variables uw; and us, the 
system of constraints (3.21)-(3.24) and (3.26) can be rewritten 


ui _ Te 0 A Sı 
u d S 
— 1 1 
0 = ( i ) ( . (3.28) 
ú T 
1 
( ie (3.29) 


S1 
( a ) : (3.30) 


Introducing four obvious new variables permits us to rewrite (3.27)- (3.30) in the 
generic formulation 


S 
IA 


D 
IA 


u q+ Ms (3.31) 
0 ul s (3.32) 
u > 0 (3.33) 
s > 0, (3.34) 


of a so-called a complementarity problem . 


A pivoting algorithm ([Lemke & Howson 1964], [Lemke 1965]) has been proposedto 
solve such problems. This algorithm applies also to quadratic programming , so this 
confirms that the solution of a bimatrix game is of the same level of difficulty as solving 
a quadratic programming problem. 


Remark 3.3.2 Once we obtain x and y, solution to (3.27)-(3.30) we shall have to 
reconstruct the strategies through the formulae 


s1/(st 1m) (3.35) 
s2/(si1,). (3.36) 
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3.4 Concave m-Person Games 


The nonuniqueness of equilibria in bimatrix games and a fortiori in m-player matrix 
games poses a delicate problem. If there are many equilibria, in a situation where 
one assumes that the players cannot communicate or enter into preplay negotiations, 
how will a given player choose among the different strategies corresponding to the 
different equilibrium candidates? In single agent optimization theory we know that 
strict concavity of the (maximized) objective function and compactness and convexity 
of the constraint set lead to existence and uniqueness of the solution. The following 
question thus arises 


Can we generalize the mathematical programming approach to a situation 
where the optimization criterion is a Nash-Cournot equilibrium? can we 
then give sufficient conditions for existence and uniqueness of an equilib- 
rium solution? 


The answer has been given by Rosen in a seminal paper [Rosen, 1965] dealing with 
concave m-person game . 


A concave m-person game is described in terms of individual strategies repre- 
sented by vectors in compact subsets of Euclidian spaces (R”” for Player j) and by 
payoffs represented, for each player, by a continuous functions which is concave w.r.t. 
his own strategic variable. This is a generalization of the concept of mixed strategies, 
introduced in previous sections. Indeed, in a matrix or a bimatrix game the mixed 
strategies of a player are represented as elements of a simplex, i.e. a compact con- 
vex set, and the payoffs are bilinear or multilinear forms of the strategies, hence, for 
each player the payoff is concave w.r.t. his own strategic variable. This structure is 
thus generalized in two ways: (i) the strategies can be vectors constrained to be in a 
more general compact convex set and (ii) the payoffs are represented by more general 
continuous-concave functions. 


Let us thus introduce the following game in strategic form 


e Each player j E€ M = {1,...,m} controls the action u; E€ U; a compact convex 
subset of R"’, where m; is a given integer, and gets a payoff w;(u1,...,Uj,.--,; Um), 
where Y; : U1 x... X Uj;,... X Um +> R is continuous in each u; and concave 
in uj 

j 


e A coupled constraint is defined as a proper subset U of U1 X... X U; X... X Um. 
The constraint is that the joint action u = (u1, .. . , Um) must be in U. 
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Definition 3.4.1 An equilibrium, under the coupled constraint U is defined as a deci- 
sion m-tuple (uj,... sft ocho ,ux,) E U such that for each player j € M 


Wy (Uy yess Uj eer tl) Z Wy Uy eng Wyss y Um) (3.37) 
forallus E€ U; Sh iy sg Uy os, U) © (3.38) 


Remark 3.4.1 The consideration of a coupled constraint is a new feature. Now each 
player’s strategy space may depend on the strategy of the other players. This may look 
awkward in the context of nonocooperative games where thw player cannot enter into 
communication or cannot coordinate their actions. However the concept is mathemat- 
ically well defined. We shall see later on that it fits very well some interesting aspects 
of environmental management. One can think for example of a global emission con- 
straint that is imposed on a finite set of firms that are competing on the same market. 
This environmental example will be further developed in forthcoming chapters. 


3.4.1 Existence of Coupled Equilibria 


Definition 3.4.2 A coupled equilibrium is a vector u* such that 
pj(u*) = max{pj(up,..., Ugs- -oy Um) (UT -+ -3 Uys ++ Um) © UY}. (3.39) 


At such a point no player can improve his payoff by a unilateral change in his strategy 
which keeps the combined vector in U. 


Let us first show that an equilibrium is actually defined through a fixed point condi- 
tion. For that purpose we introduce a so-called global reaction function 0: UxUu +> R 
defined by 


O(u,v,r) = SET igs sea Dias sey Um), (3.40) 
j=l 


where r; > 0, 7 = 1,...,m are given weights. The precise role of this weighting 
scheme will be explained later. For the moment we could take as well r; = 1. Notice 
that, even if u and v are in U, the combined vectors (u,...,U;,..., Um) are element 
of a larger set in U} x ... x Um. This function is continuous in u and concave in v 
for every fixed u. We call it a reaction function since the vector v can be interpreted as 
composed of the reactions of the different players to the given vector u. This function 
is helpful as shown in the following result. 


Lemma 3.4.1 Let u* € U be such that 


O(u*,u*,r) = max f(u“, u, r). (3.41) 


Then u* is a coupled equilibrium. 
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Proof: Assume u* satisfies (3.41) but is not a coupled equilibrium, i.e. does not 
satisfy (3.39). Then, for one player, say £, there would exist a vector 


OG st se opt, Ved 
such that 
Pelu,- Ut- Um) > Pel(u*). 
Then we shall also have 6(u*, ū) > @(u*, u*) which is a contradiction to (3.41). QED 


This result has two important consequences. 


1. It shows that proving the existence of an equilibrium reduces to proving that a 
fixed point exists for an appropriately defined reaction mapping (u* is the best 
reply to u* in (3.41)); 


2. it associates with an equilibrium an implicit maximization problem, defined in 
(3.41). We say that this problem is implicit since it is defined in terms of its 
solution u*. 


To make more precise the fixed point argument we introduce a coupled reaction map- 
ping. 


Definition 3.4.3 The point to set mapping 
T(u,r) = {v|@(u, v,r) = max 6(u, w,r)}. (3.42) 


is called the coupled reaction mapping associated with the positive weighting r. A 
fixed point of T (-,r) is a vector u* such that u* € T (u*,r). 


By Lemma 3.4.1 a fixed point of I'(-, r) is a coupled equilibrium. 


Theorem 3.4.1 For any positive weighting r there exists a fixed point of T (-,r), i.e. a 
point u* s.t. u* € [(u*,r). Hence a coupled equilibrium exists. 


Proof: The proof is based on the Kakutani fixed-point theorem that is given in the 
Appendix of section 3.8. One is required to show that the point to set mapping is upper 
semicontinuous. QED 


Remark 3.4.2 This existence theorem is very close, in spirit, to the theorem of Nash. 
It uses a fixed-point result which is “topological” and not “constructive”, i.e. it does 
not provide a computational method. However, the definition of a normalised equilib- 
rium introduced by Rosen establishes a link between mathematical programming and 
concave games with coupled constraints. 
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3.4.2 Normalized Equilibria 
Kuhn-Tucker multipliers 


Suppose that the coupled constraint (3.39) can be defined by a set of inequalities 
AD) k=1,...,p (3.43) 


where hy : U, x... X Um — R, k = 1,...,p, are given concave functions. Let 
us further assume that the payoff functions ~;(-) as well as the constraint functions 
h,(-) are continuously differentiable and satisfy the constraint qualification conditions 
so that Kuhn-Tucker multipliers exist for each of the implicit single agent optimization 
problems defined below. 


Assume all players other than Player j use their strategy už, i € M, i # j. Then the 
equilibrium conditions (3.37)-(3.39) define a single agent optimization problem with 
concave objective function and convex compact admissible set. Under the assumed 
constraint qualification assumption there exists a vector of Kuhn-Tucker multipliers 


grees 


LS uj], Ay) = dy (a, wy]) + SD Agha ((u?, wy]) (3.44) 


k=1...p 
verifies, at the optimum 
ð *x—j * 
0 < Aj (3.46) 
OS; Agel A a RS ace (3.47) 


Definition 3.4.4 We say that the equilibrium is normalized if the different multipliers 
A; for j € M are colinear with a common vector ^o, namely 


i 
Aj = =o (3.48) 


Tj 


where r; > 0 is a given weighting of the players. 


Actually, this common multiplier Ao is associated with the implicit mathematical pro- 
gramming problem 


max 0(u*, u, r) (3.49) 
to which we associate the Lagrangean 
Lo(u, Ao) = 5 rpl, ujl) T 5 Aokhg (u). (3.50) 


jEM k=1...p 
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and the first order necessary conditions 


o . 
0 = {rpl X Aoehe(u*)}, GEM (3.51) 
Ou; k=1,...,p 
< Xo (3.52) 
= Aochs(u") K=1,...,p. (3.53) 


An economic interpretation 


The multiplier, in a mathematical programming framework, can be interpreted as a 
marginal cost associated with the right-hand side of the constraint. More precisely it 
indicates the sensitivity of the optimum solution to marginal changes in this right-hand- 
side. The multiplier permits also a price decentralization in the sense that, through an 
ad-hoc pricing mechanism the optimizing agent is induced to satisfy the constraints. In 
a normalized equilibrium, the shadow cost interpretation is not so apparent; however, 
the price decomposition principle is still valid. Once the common multiplier has been 
defined, with the associated weighting r; > 0,7 = 1,...,m the coupled constraint will 
be satisfied by equilibrium seeking players, when they use as payoffs the Lagrangeans 


L 1], 24) = Wu) + = Ahela, u), 


J= lm. (3.54) 


The common multiplier permits then an “implicit pricing” of the common constraint 
so that it remains compatible with the equilibrium structure. Indeed, this result to be 
useful necessitates uniqueness of the normalized equilibrium associated with a given 
weighting r; > 0,7 =1,...,m. In a mathematical programming framework, unique- 
ness of an optimum results from strict concavity of the objective function to be max- 
imized. In a game structure, uniqueness of the equilibrium will result from a more 
stringent strict concavity requirement, called by Rosen strict diagonal concavity . 


3.4.3 Uniqueness of Equilibrium 


Let us consider the so-called pseudo-gradient defined as the vector 


rız—y(u) 


T2 ae W(u) 


glu, r) = (3.55) 


Tm 2ml) 
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We notice that this expression is composed of the partial gradients of the different 
payoffs with respect to the decision variables of the corresponding player. We also 
consider the function 


o(u,r) = So rj);(u) (3.56) 
j=l 


Definition 3.4.5 The function o(u,r) is diagonally strictly concave on U if, for every 
ut et u? in U, the following holds 


(u? — u')?g(u',r) + (u! — u?) g(u?, r) > 0. (3.57) 


A sufficient condition that co (u, r) be diagonally strictly concave is that the symmetric 
matrix [G(u,r) + G(u, r)T] be negative definite for any u! in U, where G(u, r) is the 
Jacobian of g(u?, r) with respect to u. 


Theorem 3.4.2 Ifo(u,r) is diagonally strictly concave on the convex set U, with the 
assumptions insuring existence of K.T. multipliers, then for every r > 0 there exists a 
unique normalized equilibrium 


Proof We sketch below the proof given by Rosen [Rosen, 1965]. Assume that for 
some r > 0 we have two equilibria u! and u?. Then we must have 


h(u') > 0 (3.58) 
hu’) > 0 (3.59) 
and there exist multipliers àt > 0, A? > 0, such that 

Mh(u!) = 0 (3.60) 
Ahu = 0 (3.61) 

and for which the following holds true for each player 7 € M 
r;Vu pu!) +0 V,,h(u!) = 0 (3.62) 
rj7Vub;(w2) +A Vahu?) = 0. (3.63) 





We multiply (3.62) by (u? — u!)T and (3.63) by (u! — u?)T and we sum together 
to obtain an expression 3 + y = 0, where, due to the concavity of the h; and the 
conditions (3.58)-(3.61) 


D (a? — w)PV, fg (ut) + 3(at — w2) PV, he(w?)} 


Y = 
> SOA" Ru?) — h(ul)] +A” [h(u!) — h(w?)}} 
= YN au’) + h(u')} > 0, (3.64) 


jEM 
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and 


b = Yo riu- u) Vayu) + (ul -— Ww’)? Vi,vj(u7)). 8.65) 


jEM 


Since c (u, r) is diagonally strictly concave we have 3 > 0 which contradicts G+ 7 = 0. 
QED 


3.4.4 A numerical technique 


The diagonal strict concavity property that yielded the uniqueness result of theorem 3.4.2 
also provides an interesting extension of the gradient method for the computation of 
the equilibrium. The basic idea is to ’project”, at each step, the pseudo gradient g(u, r) 
on the constraint set U = {u : h(u) > 0} (let’s call g(u‘,r) this projection) and to 
proceed through the usual steepest ascent step 

uit! = ut ts 7g(u’, r). 
Rosen shows that, at each step £ the step size T“ > 0 can be chosen small enough for 
having a reduction of the norm of the projected gradient. This yields convergence of 
the procedure toward the unique equilibrium. 


3.4.5 A variational inequality formulation 


Theorem 3.4.3 Under assumptions 3.5.1 the vector q* = (qj,.--,¢,,) is a Nash- 
Cournot equilibrium if and only if it satisfies the following variational inequality 


(q — q*)”g(q*) < 0, (3.66) 


where g(q*) is the pseudo-gradient at q* with weighting 1. 


Proof: Apply first order necessary and sufficient optimality conditions for each player 
and aggregate to obtain (3.66). QED 


Remark 3.4.3 The diagonal strict concavity assumption is then equivalent to the prop- 
erty of strong monotonicity of the operator g(q*) in the parlance of variational in- 
equality theory. 
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3.5 Cournot equilibrium 


The model proposed in 1838 by [Cournot, 1838] is still one of the most frequently used 
game model in economic theory. It represents the competition between different firms 
supplying a market for a divisible good. 


3.5.1 The static Cournot model 


Let us first of all recall the basic Cournot oligopoly model. We consider a single 
market on which m firms are competing. The market is characterized by its (inverse) 
the total supply of goods on the market. The firm j faces a cost of production Cg): 
hence, letting q = (qı, . - - , qm) represent the production decision vector of the m firms 
together, the profit of firm j is 7;(¢) = q; D(Q) — C;(q;). The following assumptions 
are placed on the model 


Assumption 3.5.1 The market demand and the firms cost functions satisfy the follow- 
ing (i) The inverse demand function is finite valued, nonnegative and defined for all 
Q € [0,00). It is also twice differentiable, with D'(Q) < 0 wherever D(Q) > 0. 
In addition D(0) > 0. (ii) Cj (q;) is defined for all q; € [0, 00), nonnegative, convex, 
twice continuously differentiable, and C"(q;) > 0. (iii) Q D(Q) is bounded and strictly 
concave for all Q such that D(Q) > 0. 


If one assumes that each firm j chooses a supply level q; € [0, q;], this model satis- 
fies the definition of a concave game à la Rosen (see exercise 3.7). There exists an 
equilibrium. Let us consider the uniqueness issue in the duopoly case. 


Consider the pseudo gradient (there is no need of weighting (r1, r2) since the con- 
straints are not coupled) 


_{ D(Q)+q D'(Q) - Cia) 
gq, q2) = ( D(Q) + & D'(Q) — Cila) 


and the jacobian matrix 


G(q, q2) = ( 2D"(Q) + a D”(Q) — C1 lq) D'(Q) + qı D"(Q) ) 





D'(Q) + @ D"(Q) 2D'(Q) + g2 D”(Q) — C1 l2) 
The negative definiteness of the symmetric matrix 
(Gla. a2) + G(a,4)") = 
( 2D'(Q) +q DQ) — C1 (q) D'(Q) +3Q D"(Q) ) 
D'(Q) + 3Q D"(Q) 2D'(Q) + q2 D"(Q) — C1 (Q2) 
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implies uniqueness of the equilibrium. 


3.5.2 Formulation of a Cournot equilibrium as a nonlinear com- 
plementarity problem 


We have seen, when we dealt with Nash equilibria in m-matrix games, that a solution 
could be found through the solution of a linear complementarity problem. A similar 
formulation holds for a Cournot equilibrium and it will in general lead to a nonlinear 
complementarity problem or NLCP, a class of problems that has recently been the 
object of considerable developments in the field of Mathematical Programming (see 
e.g. the book [Ferris & Pang, 1997a] and the survey paper [Ferris & Pang, 1997b]) We 
can rewrite the equilibrium condition as an NLCP in canonical form 


q > 0 (3.67) 
filq) > 0 (3.68) 
atila) = 0, (3.69) 
where 
f;(q) = —D(Q) — a; D'(Q) + Ci(q). (3.70) 


The relations (3.67-3.69) express that, at a Cournot equilibrium, either q; = 0 and the 
gradient g;(q) is < 0, or q; > 0 and g;(q) = 0. There are now optimization softwares 
that solve these types of problems. In [Rutherford, 1995] an extension of the GAMS 
(see [Brooke et al., 1992]) modeling software to provide an ability to solve this type of 
complementarity problem is described. More recently the modeling language AMPL 
(see [Fourer et al., 1993]) has also been adapted to handel a problem of this type and 
to submit it to an efficient NLCP solver like PATH (see [Ferris & Munson, 1994]). 


Example 3.5.1 This example comes from the AMPL website 
http://www.ampl.com/ampl/TRYAMPL/ 


Consider an oligopoly with n = 10 firms. The production cost function of firm j is 
given by 





tA Peep Bi g.) 1+6i)/ Bi 
ci(qi) = CiQi F 1 F Bi (Liqi) : (3.71) 
The market demand function is defined by 
D(Q) = (A/Q)"””. (3.72) 
Therefore 
1 A 1 
D'(Q) = +(4/Q) t- ) = -+D l 3.73 
(Q) a /Q) 0 ( g2) : (Q)/Q (3.73) 
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The Nash Cournot equilibrium will be the solution of the NLCP (3.67-3.69) where 


fq = —D(Q)-q D'(Q)+ Ci) 
= HAlQy et =4D(Q)/Q +o + (Liq). 


AMPL input 


SSSSSSSFILE nash.mod%%%%%%%S%S%% 
#==>nash.gms 
# A non-cooperative game example: a Nash equilibrium is sought. 


# References: 











# F.H. Murphy, H.D. Sherali, and A.L. Soyster, "A Mathematical 

# Programming Approach for Determining Oligopolistic Market 
# Equilibrium", Mathematical Programming 24 (1986), pp. 92- 
106. 

# P.T. Harker, "Accelerating the convergence . . .", Mathematical 
# Programming 41 (1988), pp. 29-59. 

set Rn := 1... 10; 

param gamma := 1.2; 


param c {i in Rn}; 
param beta {i in Rn}; 


param L {i in Rn} := 10; 
var q {i in Rn} >= 0; # production vector 
var Q = sum {i in Rn} qli]; 


var divQ = (5000/Q) ** (1/gamma) ; 


s.t. feas {i in Rn}: 
qli] >= 0 complements 
O <= c[i] + (L[i] * q[i])**(1/beta[i]) - divo 
- qli] * (-1/gamma) * divQ / Q; 





SSSSSSSEFILE nash. dat%%%%%%565%S 
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Fo WoONAA OB WN EF 
DN BFA WR OW UW 


`e 


param beta := 
L2 








Fo E o o S E o a A OB WBN FE 
UI 


o999099ọ9 o9909990909ọ 
SSSSSSCSFILE nash. runs%%%%%55S% 


model nash.mod; 
data nash.dat; 








set initpoint := 1 .. 4; 
param initval {Rn,initpoint} >= 0; 
data;param initval 



































: 23 4 := 

t-I 10 1.0 7 
21 10 1.2 4 
35L 10 1.4 3 
4 1 10 1.6 
aml 10 1.8 18 
6 1 10 2.1 4 
7 1 TO- 23] 
8 1 TO- 2526 
9 1 EO 2AT 
10 1 LOS 23:9" 2 





`e 


for {point in initpoint} 
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{ 

let{i in Rn} g[i] := initval[i,point]; 
solve; 

include compchk 


} 





3.5.3 Computing the solution of a classical Cournot model 


There are many approaches to find a solution to a static Cournot game. We have 
described above the one based on a solution of an NLCP. We list below a few other 
alternatives 


(a) one discretizes the decision space and uses the complementarity algorithm pro- 
posed in [Lemke & Howson 1964] to compute a solution to the associated bima- 
trix game; 


(b) one exploits the fact that the players are uniquely coupled through the condi- 


=S1,..., 


[Murphy,Sherali & Soyster, 1982]; 


(c) one formulates the Nash-Cournot equilibrium problem as a variational inequal- 
ity. Assumption 3.5.1 implies Strict Diagonal Concavity (SDC), in the sense of 
Rosen, i.e. strict monotonicity of the pseudo-gradient operator, hence unique- 
ness and convergence of various gradient like algorithms. 


3.6 Correlated equilibria 


Aumann, [Aumann, 1974], has proposed a mechanism that permits the players to enter 
into preplay arrangements so that their strategy choices could be correlated, instead 
of being independently chosen, while keeping an equilibrium property. He called this 
type of solution correlated equilibrium. 
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3.6.1 Example of a game with correlated equlibria 


Example 3.6.1 This example has been initially proposed by Aumann [Aumann, 1974]. 
Consider a simple bimatrix game defined as follows 











Cy C2 
71 15,1 | 0,0 
ro 14,4] 1,5 

















This game has two pure strategy equilibria (rı, cı) and (rz, c2) and a mixed strategy 
equilibrium where each player puts the same probality 0.5 on each possible pure strat- 
egy. The respective outcomes are shown in Table 3.2 Now, if the players agree to play 


(r1, C1) 5,1 
(r2, C2) 1,5 
(0.5 —0.5,0.5—0.5) : 2.5,2.5 


Table 3.2: Outcomes of the three equilibria 


by jointly observing a “coin flip” and playing (11, cı) if the result is “head”, (r2, C2) if 
it is “tail” then they expect the outcome 3, 3 which is a convex combination of the two 
pure equilibrium outcomes. 


By deciding to have a coin flip deciding on the equilibrium to play, a new type of 
equilibrium has been introduced that yields an outcome located in the convex hull of 
the set of Nash equilibrium outcomes of the bimatrix game. In Figure 3.2 we have 
represented these different outcomes. The three full circles represent the three Nash 
equilibrium outcomes. The triangle defined by these three points is the convex hull of 
the Nash equilibrium outcomes. The empty circle represents the outcome obtained by 
agreeing to play according to the coin flip mechanism. 


{ 5) 
3,3 
25,25 N” 


(5,1) 


» 








Figure 3.2: The convex hull of Nash equilibria 
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It is easy to see that this way to play the game defines an equilibrium for the exten- 
sive game shown in Figure 3.3. This is an expanded version of the initial game where, 
in a preliminary stage, “Nature” decides randomly the signal that will be observed by 
the players*. The result of the ”coin flip” is a “public information”, in the sense that 


(1,5) 
(4,4) 


(0,0) 


(5,1) 
(1,5) 


(4,4) 


(0,0) 





(5,1) 


Figure 3.3: Extensive game formulation with “Nature” playing first 


it is shared by all the players. One can easily check that the correlated equilibrium is 
a Nash equilibrium for the expanded game. 


We now push the example one step further by assuming that the players agree to 
play according to the following mechanism: A random device selects one cell in the 
game matrix with the probabilities shown in Figure 3.74. 





Cy | Cc 
rı A 


N 





(3.74) 








T2 











wi. © 





3 
When a cell is selected, then each player is told to play the corresponding pure strategy. 
The trick is that a player is told what to play but is not told what the recommendation to 
the other player is. The information received by each player is not “public” anymore. 
More precisely, the three possible signals are (r1, c1), (r2, c1), (r2, C2). When Player 1 





4Dotted lines represent information sets of Player 2. 
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receives the signal “play r2” he knows that with a probability 5 the other player has 
been told “play cı” and with a probability z the other player has been told “play 
c2”. When player I receives the signal “play rı” he knows that with a probability 1 
the other player has been told “play cı”. Consider now what Player I can do, if he 
assumes that the other player plays according to the recommendation. If Player 1 has 
been told “play rz” and if he plays so he expects 

1 1 
R 
if he plays rı instead he expects 
: 5 : 0= 2.5 
5 x54 5 xO = 2. 
so he cannot improve his expected reward. If Player I has been told “play rı” and 
if he plays so he expects 5, whereas if he plays rz he expects 4. So, for Player 1, 
obeying to the recommendation is the best reply to Player 2 behavior when he himself 
plays according to the suggestion of the signalling scheme. Now we can repeat the the 
verification for Player 2. If he has been told “play cı” he expects 

: 1 l A=: 

5 x1+ 5 x 4 = 2.5; 
if he plays cz instead he expects 

: 0 : S= 25 

5 x0+ 5 p= 2, 
so he cannot improve. If he has been told “play c2” he expects 5 whereas if he plays 
cı instead he expects 4 so he is better off with the suggested play. So we have checked 
that an equilibrium property holds for this way of playing the game. All in all, each 


player expects 


ae Mee ee eee N. 
3 re SB im 3 


froma game played in this way. This is illustrated in Figure 3.4 where the black spade 
shows the expected outcome of this mode of play. Auman called it a ” correlated equi- 
librium”. Indeed we can now mix these equilibria and still keep the correlated equi- 
librium property, as indicated by the doted line on Figure 3.4; also we can construct 
an expanded game in extensive form for which the correlated equilibrium constructed 
as above defines a Nash equilibrium (see Exercise 3.6). 





In the above example we have seen that, by expanding the game via the adjunction of 
a first stage where “Nature” plays and gives information to the players, a new class 
of equilibria can be reached that dominate, in the outcome space, some of the original 
Nash-equilibria. If the random device gives an information which is common to all 
players, then it permits a mixing of the different pure strategy Nash equilibria and the 
outcome is in the convex hull of the Nash equilibrium outcomes. If the random device 
gives an information which may be different from one player to the other, then the 
correlated equilibrium can have an outcome which lies out of the convex hull of Nash 
equilibrium outcomes. 


3.6. CORRELATED EQUILIBRIA 59 


| 5) 





(3.33,3.33) 
(3,3 
(2.5,2.5) 


è (5,1) 


> 








Figure 3.4: The dominating correlated equilibrium 


3.6.2 A general definition of correlated equilibria 


Let us give a general definition of a correlated equilibrium in an m-player normal form 
game. We shall actually give two definitions. The first one describes the construct 
of an expanded game with a random device distributing some pre-play information to 
the players. The second definition which is valid for m-matrix games is much simpler 
although equivalent. 


Nash equilibrium in an expanded game 


Assume that a game is described in normal form, with m players 7 = 1,...,m, their 
respective strategy sets I; and payoffs V;(y1,---, Yj, ---, Ym). This will be called the 
original normal form game. 


Assume the players may enter into a phase of pre-play communication during 
which they design a correlation device that will provide randomly a signal called pro- 
posed mode of play. Let E = {1,2,..., L} be the finite set of the possible modes of 
play. The correlation device will propose with probability A(/) the mode of play £. 
The device will then give the different players some information about the proposed 
mode of play. More precisely, let H; be a class of subsets of F, called the information 
structure of player j. Player j, when the mode of play £ has been selected, receives an 
information denoted h;(£) € H;. Now, we associate with each player j a meta strategy 
denoted 7; : H; — I’; that determines a strategy for the original normal form game, 
on the basis of the information received. All this construct, is summarized by the data 
(E, {AL jeer, thy (Q) © Ayhjemece, 197 : Hj > T;}jem) that defines an expanded 
game. 


Definition 3.6.1 The data (E, {A(L) hees, {hi (4) E Hj} emer, 15 ; H; =? Dy} jen) 
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defines a correlated equilibrium of the original normal form game if it is a Nash equi- 
librium for the expanded game i.e. if no player can improve his expected payoff by 
changing unilaterally his meta strategy, i.e. by playing ¥;(h;(¢)) instead of 7; (h;(£)) 
when he receives the signal h;(¢) € H}. 


Dd ACOV (FF (Ag (2), Tir lhu- O) 2 D AOV (a (hy), Vg (har-s(0))]- 
(3.75) 


An equivalent definition for m-matrix games 


In the case of an m-matrix game, the definition given above can be replaced with the 
following one, which is much simpler 


Definition 3.6.2 A correlated equilibrium is a probability distribution 1(s) over the 
set of pure strategies S = Sı X Sg... X Sm such that, for every player j and any 
mapping ð; : S; — S; the following holds 


E m(s)Vj((sj, sm-4) > E a(S VEs); sr). (3.76) 


ses ses 


3.7 Bayesian equilibrium with incomplete information 


Up to now we have considered only games where each player knows everything con- 
cerning the rules, the players types (i.e. their payoff functions, their strategy sets) etc... 
We were dealing with games of complete information. In this section we look at a 
particular class of games with incomplete information and we explore the class of so- 
called Bayesian equilibria. 


3.7.1 Example of a game with unknown type for a player 


In a game of incomplete information, some players do not know exactly what are the 
characteristics of other players. For example, in a two-player game, Player 2 may not 
know exactly what the payoff function of Player 1 is. 


Example 3.7.1 Consider the case where Player 1 could be of two types, called 0? and 
6? respectively. We define below two matrix games, corresponding to the two possible 
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types respectively: 


















































6} C1 C2 0? Ci C2 

rı |0,—1 | 2,0 rı | 1.5,—1 | 3.5,0 

rg | 2,1 |3,0 rg| 2,1 3,0 
Game 1 Game 2 


If Player 1 is of type 6', then the bimatrix game 1 is played; if Player 1 is of type 6”, 
then the bimatrix game 2 is played; the problem is that Player 2 does not know the type 
of Player 1. 


3.7.2 Reformulation as a game with imperfect information 


Harsanyi, in [Harsanyi, 1967-68], has proposed a transformation of a game of incom- 
plete information into a game with imperfect information. For the example 3.7.1 this 
transformation introduces a preliminary chance move, played by Nature which decides 
randomly the type 6’ of Player 1. The probability distribution of each type, denoted 
p! and p? respectively, represents the beliefs of Player 2, given here in terms of prior 
probabilities, about facing a player of type 0' or 07. One assumes that Player 1 knows 
also these beliefs, the prior probabilities are thus common knowledge. The information 
structure? in the associated extensive game shown in Figure 3.5, indicates that Player 1 
knows his type when deciding, whereas Player 2 does not observe the type neither, 
indeed in that game of simultaneous moves, the action chosen by Player 1. Call x; 
(resp. 1 — x,) the probability of choosing rı (resp. r2) by Player 1 when he implements 
a mixed strategy, knowing that he is of type 0’. Call y (resp. 1 — y) the probability of 
choosing cı (resp. c2) by Player 2 when he implements a mixed strategy. 


We can define the optimal response of Player 1 to the mixed strategy (y, 1 — y) of 
Player 2 by solving® 
max any + af, (1 — y) 


if the type is 6t, 





6? 8? 
MAR Ui YE g ay) 


if the type is 6”. 


We can define the optimal response of Player 2 to the pair of mixed strategy (x;, 1— 
zi), i = 1,2 of Player 1 by solving 


nax py (0169, + (1 — 21)d8;) + polz2b%; + (1 — 22)09;). 
j=1,2 





>The dotted line in Figure 3.5 represents the information set of Player 2. 
We call af; and b? the payoffs of Player 1 and Player 2 respectively when the type is 0°. 
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(3,0) 
(2,1) 


(3.5,0) 


(1.5,-1) 
(3,0) 


(2,1) 


(2,0) 





(0,- 1) 


Figure 3.5: Extensive game formulation of the game of incomplete information 


Let us rewrite these conditions with the data of the game illustrated in Figure 3.5. First 
consider the reaction function of Player 1 


6, => max{0y+ 2(1—y),2y+3(1—-y)} 
6. => max{l.5y+3.5(1 — y), 2y + 3(1 — y)}. 


We draw in Figure 3.6 the lines corresponding to these comparisons between two linear 
functions. We observe that, if Player 1’s type is 6, he will always choose r2, whatever 
y, whereas, if his type is 6? he will choose r; if y is mall enough and switch to rz when 
y > 0.5. For y = 0.5, the best reply of Player 1 could be any mixed strategy (x, 1 — x). 


Consider now the optimal reply of Player 2. We know that Player 1, if he is of type 
6, xhooses always action rə i.e. x; = 0. So the best reply conditions for Player 2 can 
be written as follows 


max{(pi(—1) + (1 = pi) (#2(—1) + (1 — x2)1)), (pi (0) + (1 — pi) (x2(0) + (1 — x2)0))} 
which boils down to 
max{(1 — 2(1 — p,)x), 0)}. 
. . 1 1 
We conclude that Player 2 chooses action c; if £2 < p’ p) 
and any mixed action with y € [0,1] if z2 = TENE We conclude easily from these 
observations that the equilibria of this game are characterized as follows: 


action cə if x2 > 
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T2 

















0! 0? 
Figure 3.6: Optimal reaction of Player 1 to Player 2 mixed strategy (y, 1 — y) 


xı =0 
ifp < 0.5 z= 0,y = lor z= 1,y = 0 of £2 = Mop) ¥ = 0.5 
if pı > 0.5 t= 0,9 = 1. 


3.7.3 A general definition of Bayesian equilibria 


We can generalize the analysis performed on the previous example and introduce the 
following definitions. Let M be a set of m players. Each player 7 € M may be of 
different possible types. Let ©, be the finite set of types for Player j. Whatever his 
type, Payer j has the same set of pure strategies S;. If 0 = (0),....A8m) E O = 
O1 x ... X Om is a type specification for every player, then the normal form of the 
game is specified by the payoff functions 


A prior probability distribution p(6,, ....0,,) on © is given as common knowledge. We 
assume that all the marginal probabilities are nonzero 


When Player j observes that he is of type 6; € O; he can construct his revised condi- 
tional probality distribution on the types @;,;_; of the other players through the Bayes 


formula ((0,. j) 
P\ Uj, UM-j 
p(6@u_;|6;) = —_—. 
( M jl J p;(0;) 


We can now introduce an expanded game where Nature draws randomly, according to 
the prior probability distribution p(-), a type vector 0 € © for all players. Player-j can 
observe only his own type 0; € O;. Then each player j € M picks a strategy in his 
own strategy set S;. The outcome is then defined by the payoff functions (3.77). 


(3.78) 
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Definition 3.7.1 In a game of incomplete information with m players having respec- 
tive type sets O; and pure strategy sets S;, i = 1,...,m, a Bayesian equilibrium is a 
Nash equilibrium in the “ expanded game ” in which each player pure strategy y; is a 
map from Oj; to Sj. 


The expanded game can be described in its normal form as follows. Each player 7 € M 
has a strategy set I’; where a strategy is defined as a mapping y; : O; — S;. Associated 
with a strategy profile y = (71, - . - , Ym) the payoff to player j is given by 


Vily) = Do aO) > pOl) 


0;€0; Om-j;EOM-j 


As usual, a Nash equilibrium is a strategy profile y* = (97,...,7%,) €e 0 =T. x... x 
Pm such that 


Vly"): Vip Tu- Yy ET; (3.80) 


It is easy to see that, since each p;(0;) is positive, the equilibrium conditions (3.80) 
lead to the following conditions 


1 (0j) = argmaxs es, >O p(Om-5l0;) us (l0, Os); ls, Yir-;(0m-3))) 


Om-;EOM-j 


j € M.(3.81) 


Remark 3.7.1 Since the sets O; and S; are finite, the set T ; of mappings from ©; to S; 
is also finite. Therefore the expanded game is an m-matrix game and, by Nash theorem 
there exists at least one mixed strategy equilibrium. 


3.8 Appendix on Kakutani Fixed-point theorem 


Definition 3.8.1 Let © : R” œ 2R" be a point to set mapping. We say that this 


gees 


gees 


Yr E (zp), k =1,2,..., is such that y? € ®(x9). 


Theorem 3.8.1 Let ® : A + 24 be a point to set upper semicontinuous mapping, 
where A is a compact subset’ of R™. Then there exists a fixed-point for ®. That is, 
there is x* € ®(2*) for some x* € A. 





Tie. a closed and bounded subset. 
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3.9 exercises 


Exercise 3.1: Consider the matrix game. 


3 1 8 
4 10 0 


Assume that the players are in the simultaneous move information structure. Assume 
that the players try to guess and counterguess the optimal behavior of the opponent in 
order to determine their optimal strategy choice. Show that this leads to an unstable 
process. 


Exercise 3.2: Find the value and the saddle point mixed-strategies for the above ma- 
trix game. 


Exercise 3.3: Define the quadratic programming problem that will find a Nash equi- 
librium for the bimatrix game 


(52,50)* (44,44) (44,41) 
(42,42) (46,49)* (39,43) |° 


Verify that the entries marked with a * correspond to a solution of the associated 
quadratic programming problem. 


Exercise 3.4: Do the same as above but using now the complementarity problem 
formulation. 


Exercise 3.5: Consider the two player game where the strategy sets are the intervals 
U, = [0, 100] and U2 = [0, 100] respectively, and the payoffs yı (u) = 25uı +15u1u2— 
4u? and y(u) = 100u2 — 50u; — uyu2 — 2u2 respectively. Define the best reply 
mapping. Find an equilibrium point. Is it unique? 
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Exercise 3.6: Consider the two player game where the strategy sets are the intervals 
U; = [10, 20] and Uz = [0, 15] respectively, and the payoffs yı (u) = 40u; + 5u1u2 — 
2u? and y(u) = 50u2 — 3uyu2 — 2u3 respectively. Define the best reply mapping. 
Find an equilibrium point. Is it unique? 


Exercise 3.7: Consider an oligopoly model. Show that the assumptions 3.5.1 define 
a concave game in the sense of Rosen. 


Exercise 3.6: In example 3.7.1 a correlated equilibrium has been constructed for the 
game 





Cy C2 
r, | 5,1] 0,0 |. 
ro 14,41 1,5 























Find the associated extensive form game for which the proposed correlated equilibrium 
corresponds to a Nash equilibrium. 


Part Il 


Repeated and sequential Games 


Chapter 4 


Repeated games and memory 
Strategies 


In this chapter we begin our analysis of “dynamic games”. To say it in an imprecise 
but intuitive way, the dynamic structure of a game expresses a change over time of the 
conditions under which the game is played. In a repeated game this structural change 
over time is due to the accumulation of information about the “history” of the game. As 
time unfolds the information at the disposal of the players changes and, since strategies 
transform this information into actions, the players’ strategic choices are affected. If 
a game is repeated twice, i.e. the same game is played in two successive stages and a 
reward is obtained at each stage, one can easily see that, at the second stage, the players 
can decide on the basis of the outcome of the first stage. The situation becomes more 
and more complex as the number of stages increases, since the players can base their 
decisions on histories represented by sequences of actions and outcomes observed 
over increasing numbers of stages. We may also consider games that are played over an 
infinite number of stages. These infinitely repeated games are particularly interesting, 
since, at each stage, there is always an infinite number of remaining stages over which 
the game will be played and, therefore, there will not be the so-called “end of horizon 
effect” where the fact that the game is ending up affects the players behavior. Indeed, 
the evaluation of strategies will have to be based on the comparison of infinite streams 
of rewards and we shall explore different ways to do that. In summary, the important 
concept to understand here is that, even if it is the “same game” which is repeated over 
a number of periods, the global “repeated game” becomes a fully dynamic system with 
a much more complex strategic structure than the one-stage game. 


In this chapter we shall consider repeated bimatrix games and repeated concave m- 
player games. We shall explore, in particular, the class of equilibria when the number 
of periods over which the game is repeated (the horizon) becomes infinite. We will 
show how the class of equilibria in repeated games can be expended very much if one 
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allows memory in the information structure. The class of memory strategies allows the 
players to incorporate threats in their strategic choices. A threat is meant to deter the 
opponents to enter into a detrimental behavior. In order to be effective a threat must be 
credible. We shall explore the credibility issue and show how the subgame perfectness 
property of equilibria introduces a refinement criterion that make these threats credible. 


Notice that, at a given stage, the players actions have no direct influence on the 
normal form description of the games that will be played in the forthcoming periods. A 
more complex dynamic structure would obtain if one assumes that the players actions 
at one stage influence the type of game to be played in the next stage. This is the case 
for the class of Markov or sequential games and a fortiori for the class of differential 
games, where the time flows continuously. These dynamic games will be discussed in 
subsequent chapters of this volume. 


4.1 Repeating a game in normal form 


Repeated games have been also called games in semi-extensive form in [Friedman 1977]. 
Indeed this corresponds to an explicit dynamic structure, as in the extensive form, but 
with a game in normal form defined at each period of time. In the repeated game, 
the global strategy becomes a way to define the choice of a stage strategy at each pe- 
riod, on the basis of the knowledge accumulated by the players on the ’history” of the 
game. Indeed the repetition of the same game in normal form permits the players to 
adapt their strategies to the observed history of the game. In particular, they have the 
opportunity to implement a class of strategies that incorporate threats. 


The payoff associated with a repeated game is usually defined as the sum of the 
payoffs obtained at each period. However, when the number of periods to tend to oo, 
a total payoff that implies an infinite sum may not converge to a finite value. There are 
several ways of dealing with the comparison of infinite streams of payoffs. We shall 
discover some of them in the subsequent examples. 


4.1.1 Repeated bimatrix games 


Figure 4.1 describes a repeated bimatrix game structure. The same matrix game is 
played repeatedly, over a number T of periods or stages that represent the passing of 
time. In the one-stage game, where Player | has p pure strategies and Player 2 q pure 
strategies and the one-stage payoff pair associated with the startegy pair (k, £) is given 
by (a4,);=1,2. The payoffs are accumulated over time. As the players may recall the 
past history of the game, the extensive form of those repeated games is quite complex. 
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Figure 4.1: A repeated game 


The game is defined in a semi-extensive form since at each period a normal form (i.e. 
an already aggregated description) defines the consequences of the strategic choices 
in a one-stage game. Even if the same game seems to be played at each period (it is 
similar to the others in its normal form description), in fact, the possibility to use the 
history of the game as a source of information to adapt the strategic choice at each 
stage makes a repeated game a fully dynamic “object”. 


We shall use this structure to prove an interesting result, called the “folk theorem”, 
which shows that we can build an infinitely repeated bimatrix game, played with the 
help of finite automata, where the class of equilibria permits the approximation of any 
individually rational outcome of the one-stage bimatrix game. 


4.1.2 Repeated concave games 


Consider now the class of dynamic games where one repeats a concave game a la 
Rosen. Lett € {0,1,...,7 — 1} be the time index. At each time period t the players 
enter into a noncooperative game defined by 


(M; U5; pilh J E M) 


where as indicated in section 3.4, M = {1,2,...,m} is the set of players, U; is a 
compact convex set describing the actions available to Player j at each period and 
w,(u(t)) is the payoff to player j when the action vector u(t) = (ui(t),...,Um(t)) € 
U = U x U2 x -+ X Um is chosen at period t by the m players. This function is 
assumed to be concave in uj. 


We shall denote à = (u(t) : t = 0,1...,7 — 1) the actions sequence over the 
sequence of 7’ periods. The total payoff of player j over the 7’-horizon is then defined 
by 


Vj) (a) = > v;(u(t)). (4.1) 
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Open-loop information structure 


At period t, each player j knows only the current time ¢ and what he has played in 
periods 7 = 0,1,...,¢— 1. He does not observe what the other players do. We implic- 
itly assume that the players cannot use the rewards they receive at each period to infer 
through e.g. an appropriate filtering the actions of the other players. The open-loop 
information structure actually eliminates almost every aspect of the dynamic structure 
of the repeated game context. 


Closed-loop information structure 


At period t, each player 7 knows not only the current time t and what he has played 
in periods 7 = 0,1,...,¢— 1, but also what the other players have done at previous 
period. We call history of the game at period T the sequence 


h(t) = (u(t):t=0,1..., 7—1). 
A closed-loop strategy for Player 7 is defined as a sequence of mappings 
y : h(t) => U}. 


Due to the repetition over time of the same game, each player j can adjust his choice 
of one-stage action uj to the history of the game. This permits the consideration of 
memory strategies where some threats are included in the announced strategies. For 
example, a player can declare that some particular histories would “trigger” a retalia- 
tion from his part. The description of a trigger strategy will therefore include 


e anominal “mood” of play that contributes to the expected or desired outcomes 
e aretaliation “mood” of play that is used as a threat 


e the set of histories that trigger a switch from the nominal mood of play to the 
retaliation mood of play . 


Many authors have studied equilibria obtained in repeated games through the use of 
trigger strategies (see for example [Radner, 1980], [Radner, 1981], [Friedman 1986]). 


Infinite horizon games 


If the game is repeated over an infinite number of periods then payoffs are represented 
by infinite streams of rewards %;(u(t)), t= 1,2,...,00. 
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Discounted sums: We have assumed that the actions sets U; are compact, therefore, 
since the functions 7,;(-) are continuous, the one-stage rewards 7, (u(t)) are uniformly 
bounded. Hence, the discounted payoffs 


v) = 1 y(ult), 42) 


where 8; € [0, 1) is the discount factor of Player j, are well defined. Therefore, if the 
players discount time, we can compare strategies on the basis of the discounted sums 
of the rewards they generate for each player. 


Average rewards per period: Ifthe players do not discount time the comparison of 
strategies imply the consideration of infinite streams of rewards that sum up to infinity. 
A way to circumvent the difficulty is to consider a limit average reward per period in 


the following way 
T-1 


1 
gŭ) = lim inf 7 2 vult). (4.3) 


To link this evaluation of infinite streams of rewards with the previous one based on 
discounted sums one may define the equivalent discounted constant reward 


9 0) = (1-0) È Aitu). (4.4) 
t=0 
One expects that, for well behaved games, the following limiting property holds true 
lim gj? (0) = g;(ù). (4.5) 


In infinite horizon repeated games each player has always enough time to retaliate . He 
will always have the possibility to implement an announced threat, if the opponents are 
not playing according to some nominal sequence corresponding to an agreement. We 
shall see that the equilibria, based on the use of threats constitute a very large class. 
Therefore the set of strategies with the memory information structure is much more 
encompassing than the open-loop one. 


Existence of a “trivial” dynamic equilibrium 


We know, by Rosen’s existence theorem, that a concave ”static game” 
(M; U;,¥;(-),7 E M) 


admits an equilibrium u*. It should be clear (see exercise 4.1) that the sequence u* = 
(u(t) = u* :t =0,1...,7 — 1) is also an equilibrium for the repeated game in both 
the open-loop and closed-loop (memory) information structure. 
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Assume the one period game is such that a unique equilibrium exists. The open- 
loop repeated game will also have a unique equilibrium, whereas the closed-loop re- 
peated game will still have a plethora of equilibria, as we shall see it shortly. 


4.2 Folk theorem 


In this section we present the celebrated Folk theorem. This name is due to the fact that 
there is no well defined author of the first version of it. The idea is that an infinitely 
repeated game should permit the players to design equilibria, supported by threats, 
with outcomes being Pareto efficient. The strategies will be based on memory, each 
player remembering the past moves of game. Indeed the infinite horizon could give 
raise to an infinite storage of information, so there is a question of designing strategies 
with the help of mechanisms that would exploit the history of the game but with a finite 
information storage capacity. Finite automata provide such systems. 


4.2.1 Repeated games played by automata 


A finite automaton is a logical system that, when stimulated by an input, generates an 
output. For example an automaton could be used by a player to determine the stage- 
game action he takes given the information he has received. The automaton is finite 
if the length of the logical input (a stream of 0 — 1 bits) it can store is finite. So in 
our repeated game context, a finite automaton will not permit a player to determine his 
stage-game action upon an infinite memory of what has happened before. To describe 
a finite automaton one needs the following 


e A list of all possible stored input configurations; this list is finite and each ele- 
ment is called a state. One element of this list is the initial state. 


e An output function determines the action taken by the automaton, given its cur- 
rent state. 


e A transition function tells the automaton how to move from one state to another 
after it has received a new input element. 


In our paradigm of a repeated game played by automata, the input element for the 
automaton a of Player | is the stage-game action of Player 2 in the preceding stage, 
and symmetrically for Player 2. In this way, each player can choose a way to process 
the information he receives in the course of the repeated plays, in order to decide the 
next action. 
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An important aspect of finite automata is that, when they are used to play a repeated 
game, they necessarily generate cycles in the successive states and hence in the actions 
taken in the successive stage-games. 


Let G be a finite two player game, i.e. a bimatrix game, which defines the one- 
stage game. U and V are the sets of pure strategies in the game G for Player 1 and 2 
respectively. Denote by g;(u, v) the payoff to Player j in the one-stage game when the 
pure strategy pair (u,v) € U x V has been selected by the two players. 


The game is repeated indefinitely. Although the set of strategies for an infinitely 
repeated game is enormously rich, we shall restrict the strategy choices of the players 
to the class of finite automata. A pure strategy for Player | is an automaton a € A 
which has for input the actions u € U of Player 2. Symmetrically a pure strategy for 
Player 2 is an automaton b € B which has for input the actions v € V of Player 1. 


We associate with the pair of finite automata (a,b) € A x B a pair of average 
payoffs per stage defined as follows 


z a 
n=1 


where N is the length of the cycle associated with the pair of automata (a,b) and 
(Un; Un) is the action pair at the n-th stage in this cycle. One notices that the expression 
(4.6) is also the limite average reward per period due to the cycling behavior of the two 
automata. 


We call G the game defined by the strategy sets A and B and the payoff functions 
(4.6). 


4.2.2 Minimax point 


Assume Player 1 wants to threaten Player 2. An effective threat would be to define his 
action in a one-stage game through the solution of 


Mz = min max g2(u, v). 


Similarly if Player 2 wants to threaten Player 2, an effective threat would be to define 
his action in a one-stage game through the solution of 


Mı = min max gı (u,v). 


We call minimax point the pair Mm = (Mı, M2) in R?. 


76 CHAPTER 4. REPEATED GAMES AND MEMORY STRATEGIES 


4.2.3 Set of outcomes dominating the minimax point 


Theorem 4.2.1 Let Pm be the set of outcomes in a matrix game that dominate the min- 
imax point Mm. Then the outcomes corresponding to Nash equilibria in pure strategies! 
of the game G are dense in Pm. 


Proof: Let gi, g2,...,gx be the pairs of payoffs that appear in the bimatrix game 
when it is written in the vector form, that is 


Sk = (gı (Uk, Vk), g2(Uk, UE)) 


where (ux, vg) is a possible pure strategy pair. Let qi, q2, . - - , qg be nonnegative ratio- 
nal numbers that sum to 1. Then the convex combination 


g“ = N81 + Q282,- - -, (KEK 


is an element of P if gf > Mı and g3 > Mə . The set of these g is dense in P. Each qk 
being a rational number can be written as a ratio of two integers, that is a fraction. It is 
possible to write the K fractions with a common denominator, hence 
Nk 
dk = N’ 
with 
Ni +n +... +ng =N 


since the q,’s must sum to 1. 


We construct two automata a* and b*, such that, together, they play (u1, v1) dur- 
ing nı stages, (u2, v2) during nə stages, etc. They complete the sequence by playing 
(ux, vg) during nx stages and the N-stage cycle begins again. The limit average per 
period reward of Player 7 in the G game played by these automata is given by 


pe * ok 1 K = ok 

gia ,*)= N De Pages Ue) = 2. qrg; (uk, Ue) = gj. 
Therefore these two automata will achieve the payoff pair g. Now we refine the struc- 
ture of these automata as follows: 


e Let ŭ(n) and (n) be the pure strategy that Player 1 and Player 2 respectively 
are supposed to play at stage n, according to the cycle defined above. 





'A pure strategy in the game G is a deterministic choice of an automaton; this automaton will process 
information coming from the repeated use of mixed strategies in the repeated game. 
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e The automaton a* plays ŭ(n + 1) at stage n + 1 if automaton b has played 0(n) 
at stage n; otherwise it plays u, the strategy that is minimaxing Player 2’s one 
shot payoff and this strategy will be played forever. 


e Symmetrically, the automaton b* plays (n + 1) at stage n + 1 if automaton a 
has played u(n) at stage n; otherwise it plays v, the strategy that is minimaxing 
Player 1’s one shot payoff and this strategy will be played forever. 


It should be clear that the two automata a* and b* defined as above constitute a Nash 
equilibrium for the repeated game G whenever g is dominating îm, i.e. g € Pm. Indeed, 
if Player 1 changes unilaterally his automaton to a then the automaton b* will select 
for all periods, except a finite number, the minimax action v and, as a consequence, the 
limit average per period reward obtained by Player 1 is 


gi(a, b*) < Mi < g(a", d*). 
Similarly for Player 2, a unilateal change to b € B leads to a payoff 
ga(a*,b) < mz < Jala”, b”). 


Q.E.D. 


Remark 4.2.1 This result is known as the Folk theorem, because it has always been in 
the ’folklore” of game theory without knowing to whom it should be attributed. 


The class of equilibria is more restricted if one imposes a condition that the threats 
used by the players in their announced strategies have to be credible. This is the issue 
discussed in one of the following sections. 


4.3 Collusive equilibrium in a repeated Cournot game 


Consider a Cournot oligopoly game repeated over an infinite sequence of periods. Let 
t = 1,...,00 be the sequence of time periods. Let q(t) € R” be the production vector 
chosen by the m firms at period t and w;(q(t)) the payoff to player j at period t. 


We denote 
q=q(1),...,q(t),... 
the infinite sequence of production decisions of the m firms. Over the infinite time 
horizon the rewards of the players are defined as 


v(4) = > Bilat) 
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where 3; € [0, 1) is a discount factor used by Player j. 


We assume that at each period t all players have the same information h(t) which 
is the history of passed productions h(t) = (q(0), q(1),...,q(t— 1)). 


We consider, for the one-stage game, two possible outcomes, 


(i) the Cournot equilibrium, supposed to be uniquely defined by qe = (qj) j=1 


Jeeg 


(ii) a Pareto outcome resulting from the production levels q* = (qj) j=1,..m and such 
that w;(q)* > vj(q°), j = 1,...,m. We say that this Pareto outcome is a 
collusive outcome dominating the Cournot equilibrium. 


A Cournot equilibrium for the repeated game is defined by the strategy consisting 
for each Player j to choose repeatedly the production levels q;(t) = qj, 7 = 1, 2. How- 
ever there are many other equilibria that can be obtained through the use of memory 
strategies. 


Let us define 


Vi = $ Ela) (4.7) 
t=0 
Dla) = max p(l, a). (4.8) 


The following has been shown in [Friedman 1977] 


Lemma 4.3.1 There exists an equilibrium for the repeated Cournot game that yields 
the payoffs V¥ if the following inequality holds 


®;(q*) — Vila) 
~ ;(q*) — ylas) 
This equilibrium is reached through the use of a so-called trigger strategy defined as 
follows 


if (q(0), q(1),.--,a(¢— 1)) = (a*,q*,...,q"*) then g;(t) = Gj 
otherwise q;(t) = qj. 





Bi 2 (4.9) 


(4.10) 


Proof: If the players play according to (4.10) the payoff they obtain is given by 


ie, cc se 
= 2 Bila )= Taua ). 
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Assume Player 7 decides to deviate unilaterally at period 0. He knows that his 
deviation will be detected at period 1 and that, thereafter the Cournot equilibrium pro- 
duction level q° will be played. So, the best he can expect from this deviation is the 
following payoff which combines the maximal reward he can get in period 0 when the 
other firms play g; with the discounted value of an infinite stream of Cournot outcomes. 


pj 
1- 6; 


This unilateral deviation is not profitable if the following inequality holds 


V;(@) = &,(a*) + 





v;(Q°). 





which can be rewritten as 
(1 — 8;)(%;(a") — la") + B;(¥;(a") — vla) = 0, 
from which we get 
pila) — ®;(q") — lhl) — ®;(q"*)) = 0. 
and, finally the condition (4.9) 
s Wl) — Pilat) 
~ 05(q*) — vla’) 


The same reasonning holds if the deviation occurs at any other period t. Q.E.D. 





pj 


Remark 4.3.1 According to this strategy the q* production levels correspond to a 
cooperative mood of play while the q° correspond to a punitive mood of play. The 
punitive mood is a threat. As the punitive mood of play consists to play the Cournot 
solution which is indeed an equilibrium, it is a credible threat. Therefore the players 
will always choose to play in accordance with the Pareto solution q* and thus the 
equilibrium defines a nondominated outcome. 


It is important to notice the difference between the threats used in the folk theorem 
and the ones used in this repeated Cournot game. Here the threats are constituting 
themselves an equilibrium for the dynamic game. We say that the memory (trigger 
strategy) equilibrium is subgame perfect in the sense of Selten [Selten, 1975]. 


4.3.1 Finite vs infinite horizon 


There is an important difference between finitely and infinitely repeated games. If the 
same game is repeated only a finite number of times then, in a subgame perfect equi- 
librium, a single game equilibrium will be repeated at each period. This is due to the 
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impossibility to retaliate at the last period. Then this situation is translated backward 
in time until the initial period. When the game is repeated over an infinite time horizon 
there is always enough time to retaliate and possibly resume cooperation. For exam- 
ple, let us consider a repeated Cournot game, without discounting where the players’ 
payoffs are determined by the long term average of the one sage rewards. If the players 
have perfect information with delay they can observe without errors the moves selected 
by their opponents in previous stages. There might be a delay of several stages before 
this observation is made. In that case it is easy to show that, to any cooperative solution 
dominating a static Nash equilibrium corresponds a subgame perfect sequential equi- 
librium for the repeated game, based on the use of trigger strategies. These strategies 
use as a threat a switch to the noncooperative solution for all remainig periods, as soon 
as a player has been observed not to play according to the cooperative solution. Since 
the long term average only depends on the infinitely repeated one stage rewards, it is 
clear that the threat gives raise to the Cournot equilibrium payoff in the long run. This 
is the subgame perfect version of the Folk theorem [Friedman 1986]. 


4.3.2 A repeated stochastic Cournot game with discounting and 
imperfect information 


We give now an example of stochastic repeated game for wich the construction of 
equilibria based on the use of threats is not trivial. This is the stochastic repeated game, 
based on the Cournot model with imperfect information, (see [Green & Porter, 1985], 
[Porter, 1983]). 


Let X(t) = do, x;(t) be the total supply of m firms at time ¢ and O(t) be a 
sequence of i.i.d? random variables affecting these prices. Each firm j chooses its 
supplies x;(t) > 0, in a desire to maximize its total expected discounted profit 


Vial) o TiC) Eml) = Bo > PDX (0) Oz) ~ EO 
j=l,...,m, 


where 8 < 1 is the discount factor. The assumed information structure does not allow 
each player to observe the opponent actions used in the past. Only the sequence of 
past prices, corrupted by the random noise, is available. Therefore the players can- 
not monitor without error the past actions of their opponent(s). This impossibility to 
detect without error a breach of cooperation increases considerably the difficulty of 
building equilibria based on the use of credible threats. In [Green & Porter, 1985], 
[Porter, 1983] it is shown that there exist subgame perfect equilibria, based on the use 
of trigger strategies, which are dominating the repeated Cournot-Nash equilibrium. 
This construct uses the concept of Markov game and will be further discussed in Part 3 





Independent and identically distributed. 
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of these Notes. It will be shown that these equilibria are in fact closely related to the 
so-called correlated and/or communication equilibria discussed at the end of Part 2. 


4.4 Exercises 


Exercise 5.1. Prove that the repeated one-stage equilibrium is an equilibrium for the 
repeated game with additive payoffs and a finite number of periods. Extend the result 
to infinitely repeated games. 


Exercise 5.2. Show that condition (4.9) holds if 6; = 1. Adapt Lemma 4.3.1 to the 
case where the game is evaluated through the long term average reward criterion. 
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Chapter 5 


Shapley’s Zero Sum Markov Game 


5.1 Process and rewards dynamics 


The concept of Markov game has been introduced, in a zero-sum game framework, 
in [Shapley. 1953]. The structure of this game can also be described as a controlled 
Markov chain with two competing agents. 


Let S = {1,2,...,n} be the set of possible states of a discrete time stochastic 


process {x(t) : t = 0,1,...}. Let U; = {1,2,...,A;}, j = 1,2, be the finite action 
sets of two players. The process dynamics is described by the transition probabilities 


Pss (u) = P[r(t +1) = s'lz(t)= 5, ul, 5,5 € 9, weU,x Uy, 
which satisfy, for all u € U; x U2 


Psu) > 0 
X pasu) = 1, ses. 


SES 


As the transition probabilities depend on the players actions we speak of a controlled 
Markov chain. A transition reward function 


r(s,u), sES, uEeU,x Up, 
defines the gain of player 1 when the process is in state s and the two players take the 


action pair u. Player 2’s reward is given by —r(s,u), since the game is assumed to be 
zero-sum. 
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5.2 Information structure and strategies 


5.2.1 The extensive form of the game 


The game defined above corresponds to a game in extensive form with an infinite 
number of moves. We assume an information structure where the players choose se- 
quentially and simultaneously, with perfect recall, their actions. This is illustrated in 
figure 5.1 















































Figure 5.1: A Markov game in extensive form 
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5.2.2 Strategies 


A strategy is a device which transforms the information into action. Since the players 
may recall the past observations, the general form of a strategy can be quite complex. 


Markov strategies: These strategies are also called feedback strategies. The as- 
sumed information structure allows the use of strategies defined as mappings 


yj: S= PU;), j=1,2, 


where P(U;) is the class of probability distributions over the action set U;. Since 
the strategies are based only on the information conveyed by the current state of the 
x-process, they are called Markov strategies. 


When the two players have chosen their respective strategies the state process be- 
comes a Markov chain with transition probabilities 


At A2 


PRP = eu? ps (uf, uh), 


k=1 €=1 


where we have denoted ape k = 1,..., ig and ype €=1,..., Àz the prob- 
ability distributions induced on U, and U> by 7(s) and y2(s) respectively. In order 
to formulate the normal form of the game, it remains to define the payoffs associated 
with the admissible strategies of the twp players. Let 8 € [0,1) be a given discount 
factor. Player 1’s payoff, when the game starts in state so, is defined as the discounted 
sum over the infinite time horizon of the expected transitions rewards, i.e. 


A2 
V (s0; 71, Y2) = Bays Sy oe (2(0) ype ©) r(xz(t), uf, us)|2(0) = so]. 
i=0 k-12 


Player 2’s payoff is equal to —V (s0; 71, 72). 
Definition 5.2.1 A pair of strategies (y;{, 3) is a saddle point if, for all strategies 7, 
and 2 of Players I and 2 respecively, for all s € S 
V (s; Y1, V2) < V(8; 1,2) < V (s; 77, 72). (5.1) 
The number 


v*(s) = V (s; 71,92) 


is called the value of the game at state s. 
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Memory strategies: Since the game is of perfect recall, the information on which a 
player can base his/her decision at period ¢ is the whole state-history 


h(t) = {2(0),x(1),...,2(t)}. 


Notice that we don’t assume that the players have a direct access to the actions used 
in the past by their opponent; they can only observe the state history. However, as in 
the case of a single-player Markov decision processes, it can be shown that optimality 
does not require more than the use of Markov strategies. 


5.3 Shapley’s-Denardo operator formalism 


We introduce, in this section the powerful formalism of dynamic progaming opera- 
tors that has been formally introduced in [Denardo, 1967] but was already implicit in 
Shapley’s work. The solution of the dynamic programming equations for the stochastic 
game is obtained as a fixed point of an operator acting on the space of value functions. 


5.3.1 Dynamic programming operators 


In a seminal paper [Shapley. 1953] the existence of optimal stationary Markov strate- 
gies has been established via a fixed point argument, involving a contracting op- 
erator. We give here a brief account of the method, taking our inspiration from 
[Denardo, 1967] and [Filar & Vrieze, 1997]. Let v(-) = (V(s) : s € S) be an ar- 
bitrary function with real values, defined over S. Since we assume that S is finite this 
is also a vector. Introduce, for any s € S' the so-called local reward functions 


A(v(-), 8, Ur, U2) = 7(8,,U1, U2) + 6 X` pes(ur, U2) v(s’), (ur, U2) € U1 x Us. 
SES 


Define now, for each s € S, the zero-sum matrix game with pure strategies U; and U: 
and with payoffs 


H(v(-), s) = [A(v(-), 8, uf, us)] ee, S 
> noA 


Denote the value of each of these games by 
T(v(-), 8) = vall[H(v(-), s)| 


and let T(v(-)) = (T(v(-), s) : s € S). This defines a mapping T : R” +> R”. This 
mapping is also called the Dynamic Programming operator of the Markov game. 
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5.3.2 Existence of sequential saddle points 


Lemma 5.3.1 Jf A and B are two matrices of same dimensions, then 


|val[ A] = val|B]| < mox lake = brel- (5.2) 


The proof of this lemma is left as exercise 6.1. 


Lemma 5.3.2 Ifv(-), yı and %2 are such that, forall s € S 


v(s) < (resp. > resp. =) h(v(-), 8, 11(8), 128(8)) 
= r(s, (5), Ys(8)) +8 D7 Pss (l8), 728(s)) o(s") (5.3) 


S'ES 


then 
v(s) < (resp. > resp. =) V(s; %1,2). (5.4) 


Proof: The proof is relatively straightforward and consists in iterating the inequal- 
ity (5.3). QED 


We can now establish the following result 


Theorem 5.3.1 the mapping T is contracting in the “max”-norm 
Ilo) || = max |v(s)| 


and admits the value function introduced in Definition 5.2.1 as its unique fixed-point 


Furthermore the optimal (saddle-point) strategies are defined as the mixed strategies 
yielding this value, i.e. 


h(v*(-), 8, 77 (s), ¥9(s)) = val[H(v*(-), s), 3 €S. (5.5) 


Proof: We first establish the contraction property. We use lemma 5.3.1 and the tran- 
sition probability properties to establish the following inequalities 


r(s, U1, U2) +B 5 Pss (u1, U2) v(s’) 


u1EU1; ugEU2 S'ES 


ITEC) = TH < m| max 
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| 


P D7 pss (u1, u2) (v(s") — o) 


=r (s, U1, u2) — 8 D7 pss (u1, U2) w(s’) 


SES 





max 
sES;u1EU1; u2EU2 


SES 
< A 1 
_ eaa R net, Do Pew (tito) |u(s ) w(s )| 


< , s 2 
= see sei 2 Pes (u1, u2) llo( ) w( Jl 


= llel) -woll 


Hence T is a contraction, since 0 < 8 < 1. By the Banach contraction theorem this 
implies that there exists a unique fixed point v*(-) to the operator T. 


We now show that there exist stationary Markov strategies 7; , 75 for wich the sad- 
dle point condition holds 


V(8;11,%2) < V (5; 11 12) < V8; 11, 12). (5.6) 
Let y}(s),75(s) be the saddle point strategies for the local matrix game with payoffs 

h(v*(-), S, U1, U2)a, EU, ucu = H(v(-), s). (5.7) 
Consider any strategy 72 for Player 2. Then, by definition 


h(v*(-), 8,71 (8), Y2(S) Jur et u2eUe > v"(s) Vs G S. (5.8) 


By Lemma 5.3.2 the inequality (5.8) implies that for all s € S 
Visi 77,92) 2 v"(s). (5.9) 


Similarly we would obtain that for any strategy yı and for all s € S 


V(8;%1,72) < v"(s), (5.10) 
and 

V (s; 91) 72) = (8); (5.11) 
This establishes the saddle point property. QED 


In the single player case (MDP or Markov Decision Process) the contraction and 
monotonicity of the dynamic programming operator yield readily a converging numer- 
ical algorithm for the computation of the optimal strategies [Howard, 1960]. In the 
two-player zero-sum case, even if the existence theorem can also be translated into a 
converging algorithm, some difficulties arise (see e.g. [Filar & Tolwinski, 1991] for a 
recently proposed converging algorithm). 


Chapter 6 


Nonzero-sum Markov and Sequential 
games 


In this chapter we consider the possible extension of the Shapley Markov game formal- 
ism to nonzero-sum games. We shall consider two steps in these extensions: (i) use 
the same finite state and finite action formalism as in Shapley’s and introduce differ- 
ent reward functions for the different players; (ii) introduce a more general framework 
where the state and actions can be continuous variables. 


6.1 Sequential Game with Discrete state and action sets 


We consider a stochastic game played non-cooperatively by players that are not in 
purely antagonistic sutuation. Shapley’s formalism extends without difficulty to a 
nonzero-sum setting. 


6.1.1 Markov game dynamics 


Introduce S = {1,2,...,n}, the set of possible states, U;(s) = {1,2,...,A;(s)},7 = 
1,...,m, the finite action sets at state s of m players and the transition probabilities 


pss (u) = P[a(t +1) = s'|e(t)= s, ul, s,s €S, ueUi(s) x... x Un(s). 


Define the transition reward of player 7 when the process is in state s and the players 
take the action pair u by 


rj(s,u), “Se 8, Weis) x... x Um(s). 
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Remark 6.1.1 Notice that we permit the action sets of the different players to depend 
on the current state s of the game. Shapley’s theory remains valid in that case too. 


6.1.2 Markov strategies 


Markov strategies are defined as in the zero-sum case. We denote u} CO) the proba- 


“4° . . k . . 
bility given to action u; by player j when he uses strategy 7; and the current state is 
T, 


6.1.3 Feedback-Nash equilibrium 


Define Markov strategies as above, i.e. as mappings from S into the players mixed 
actions. Player j’s payoff is thus defined as 


oo A Am 
Vao ie) = Bae oo oan a e a O) 80, 
t=0 k=1 l=1 


Definition 6.1.1 An m-tuple of Markov strategies (y;,..., Y%ž,) is afeedback-Nash equilibrium 
ifforalls € S 





Vn ene eg < Vils gie asa Va) 
for all strategies q; of player j, 3 © M. The number 


* 


vils) = Vils Ti e Vao Ta) 


will be called the equilibrium value of the game at state s for Player j. 





6.1.4 Sobel-Whitt operator formalism 


The first author to extend Shapley’s work to a nonzero-sum framework has been So- 
bel [?]. A more recent treatment of nonzero-sum sequential games can be found in 
[Whitt, 1980]. 


We introduce the so-called local reward functions 


hj(s,vj(-),u) = ry(s,u) +8 D7 pse(u)vj(s’), je M (6.1) 


SES 
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where the functions v;(-) : S — R are given reward-to-go functionals (in this case 
they are vectors of dimension n = card(S)) defined for each player j. The local 
reward (6.1) is the sum of the transition reward for player 7 and the discounted expected 
reward-to-go from the new state reached after the transition. For a given s and a given 
set of reward-to-go functionals v;(-), 7 € M, the local rewards (6.1) define a matrix 
game over the pure strategy sets U;(s), j € M. 


We now define, for any given Markov policy vector y = {7;}jem, an operator 
H,, acting on the space of reward-to-go functionals, (i.e. n-dimensional vectors) and 
defined as 


(Hyv(-))(s) = {Eyo hsls, vl), Ul fem: (6.2) 
We also introduce the operator F} defined as 
(FovQ))(s) = {sup Boo als vs), Wien, (6.3) 


where ŭ is the random action vector and y?’ is the Markov policy obtained when only 
Player j adjusts his/her policy, while the other players keep their y-policies fixed. In 
other words Eq. 6.3 defines the optimal reply of each player 7 to the Markov strategies 
chosen by the other players. 


6.1.5 Existence of Nash-equilibria 


The dynamic programming formalism introduced above, leads to the following pow- 
erful results (see [Whitt, 1980] for a recent proof) 


Theorem 6.1.1 Consider the sequential game defined above, then 


1. The expected payoff vector associated with a stationary Markov policy y is given 
by the unique fixed point v.(-) of the contracting operator H}. 


2. The operator F, is also contracting and thus admits a unique fixed-point f? (-). 
3. The stationary Markov policy y* is an equilibrium strategy iff 


f(s) = vy(s), Vses. (6.4) 


4. There exists an equilibrium defined by a stationary Markov policy. 


Remark 6.1.2 In the nonzero-sum case the existence theorem is not based on a con- 
traction property of the “equilibrium” dynamic programming operator. As a conse- 
quence, the existence result does not translate easily into a converging algorithm for 
the numerical solution of these games (see [?]). 
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6.2 Sequential Games on Borel Spaces 


The theory of non-cooperative markov games has been extended by several authors to 
the case where the state and the actions are in continuous sets. Since we are dealing 
with stochastic processes, the apparatus of measure theory is now essential. 


6.2.1 Description of the game 


An m-player sequential game is defined by the following objects: 
(S, x), Uj, P=) ril, J) Q-, ).8, 


where: 


1. (S,%) is a measurable state space with a countably generated o-algebra of X 
subsets of S. 


2. Uj is a compact metric space of actions for player j. 


3. T;(-) is a lower measurable map from S into nonempty compact subsets of U}. 
For each s, I’;(s) represents the set of admissible actions for player j. 


4. r;(-,-) : S x U & R is a bounded measurable transition reward function for 
player j. This functions are assumed to be continuous on U, for every s € S. 


5. Q(-,-) is a product measurable transition probability from S x U to S. It is 
assumed that Q(-,-) satisfies some regularity conditions which are too techni- 
cal to be given here. We refer the reader to [Nowak, 199?] for a more precise 
statement. 


6. 8 € (0, 1) is the discount factor. 


A stationary Markov strategy for player j is a measurable map 7;(-) from S into the 
set P(U;) of probability measure on U; such that y;(s) € P(T;(s)) for every s € S. 


6.2.2 Dynamic programming formalism 


The definition of local reward functions given in 6.1 in the discrete state case has to 
be adapted to the continuous state format, it becomes 


hj(s,v;(-),u) = r;(s,u EE vi(t)Q(dt|s,u), j EM (6.5) 
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The operators H, and F} are defined as above. The existence of equilibria is difficult to 
establish for this general class of sequential games. In [Whitt, 1980], the existence of 
e-equilibria is proved, using an approximation theory in dynamic programming. The 
existence of equilibria was obtained only for special cases ! 


6.3 Application to a Stochastic Duopoloy Model 


6.3.1 A stochastic repeated duopoly 


Consider the stochastic duopoly model defined by the following linear demand equa- 
tion 

x(t + 1) = a — pfu (t) + u2(t)] + e(t) 
which determines the price x(t + 1) of a good at period t + 1 given the total supply 
u(t) + u2(t) decided at the end of period t by Players 1 and 2. Assume a unit produc- 
tion cost equal to y. The profits, at the end of period t by both players (firms) are then 
determined as 


m(t) = (a(t + 1) — 6)u;(t). 


Assume that the two firms have the same discount rate 3, then over an infinite time 
horizon, the payoff to Player 7 will be given by 


V; = X T(t). 
t=0 


This game is repeated, therefore an obvious equilibrium solution consists to play re- 
peatedly the (static) Cournot solution 





a— ô 
ct) — i= 1.2 6.6 
u(t) Bp Toh (6.6) 
which generates the payoffs 
(a=? 
aam ee PD. (6.7) 
7 9p(1 — 8) 


A symmetric Pareto (nondominated) solution is given by the repeated actions 





4p 
‘see [Nowak, 1985], [Nowak, 1987], [Nowak, 1993], [Parthasarathy Shina, 1989]. 
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and the associated payoffs 


P (a — 6) 


E -Gn 


where ô = a — Ņ. 


The Pareto outcome dominates the Cournot equilibrium but it does not represent 
an equilibrium. The question is the following: 


is it possible to construct a pair of memory strategies which would define 
an equilibrium with an outcome dominating the repeated Cournot strategy 
outcome and which would be as close as possible to the Pareto nondomi- 
nated solution? 


6.3.2 A class of trigger strategies based on a monitoring device 


The random perturbations affecting the price mechanism do not permit a direct exten- 
sion of the approach described in the deterministic context. Since it is assumed that 
the actions of players are not directly observable, there is a need to proceed to some 
filtering of the sequence of observed states in order to monitor the possible breaches of 
agreement. 


In [Green & Porter, 1985] a dominating memory strategy equilibrium is constructed, 
based on a one-step memory scheme. We propose below another scheme, using a mul- 
tistep memory, that yields an outcome which lies closer to the Pareto solution. 


The basic idea consists to extend the state space by introducing a new state variable, 
denoted v which is used to monitor a cooperative policy that all players have agreed 
to play and which is defined as ¢ : v œ u; = $(v). The state equation governing the 
evolution of this state variable is designed as follows 





u(t + 1) = max{—-K, v(t) + w(t +1) — a(t + DY}, (6.8) 
where x° is the expected outcome if both players use the cooperative policy, i.e. 
2°(t-+1) =a — 2p¢(v(t)). 


It should be clear that the new state variable v provides a cumulative measure of the 
positive discrepancies between the expected prices x° and the realized ones x(t). The 
parameter — K defines a lower bound for v. This is introduced to prevent a compen- 
sation of positive discrepancies by negative ones. A positive discrepancy could be an 
indication of oversupply, i.e. an indication that at least one player is not respecting the 
agreement and is maybe trying to take advantage over the other player. 
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If these discrepancies accumulate too fast, the evidence of cheating is mounting 
and thus some retaliation should be expected. To model the retaliation process we 
introduce another state variable, denoted y, which is a binary variable, i.e. y € {0,1}. 
This new state variable will be an indicator of the prevailing mood of play. If y = 1 
then the game is played cooperatively; if y = 0, then the the game is played in a 
noncooperative manner, interpreted as a punitive or retaliatory mood of play. 


This state variable is assumed to evolve according to the following state equation 


1 if y(t)=1 and v(t+1) < A(v(t)) 


0 otherwise, o 


ut+1)={ 


where the positive valued function 0 : v ++ O(v) is a design parameter of this 
monitoring scheme. 


According to this state equation, the cooperative mood of play will be maintained 
provided the cumulative positive discrepancies do not increase too fast from one period 
to the next. Also, this state equation tells us that, once y(t) = 0, then y(t’) = 0 for all 
periods t’ > t, i.e a punitive mood of play lasts forever. In the models discussed later 
on we shall relax this assumption of everlasting punishment. 


When the mood of play is noncooperative, i.e. when y = 0, both players use as 
a punishment (or retaliation) the static Cournot solution forever. This generates the 
expected payoffs Vf, 7 = 1, 2 defined in Eq. (6.7). Since the two players are identical 
we shall not use the subscript 7 anymore. 


When the mood of play is cooperative, i.e. when y = 1, both players use an 
agreed upon policy which determines their respective controls as a function of the 
state variable v. This agreement policy is defined by the function (v). The expected 
payoff is then a function W (v) of this state variable v. 


For this agreement to be stable, i.e. not to provide a temptation to cheat to any 
player, one imposes that it be an equilibrium. Note that the game is now a sequential 
Markov game with a continuous state space. The dynamic programming equation 
characterizing an equilibrium is given below 


W(v) = max{la — 5 — p(d(v) + u)lu 
+4P[v! > 0(v)]V° 
+6P[v! < O(v)JEIW(v')Ieo" < O()]}. (6.10) 





where we have denoted 


v' = max{—K,v+ plu — ¢(v)) —e} 
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the random value of the state variable v after the transition generated by the controls 


(u, @(v)). 


In Eq. (6.10) we recognize the immediate reward [a — 6 — p(@(v) + u)|u of Player 1 
when he plays u while the opponent sticks to (v). This is added to the conditional ex- 
pected payoffs after the transition to either the punishment mood of play, correspond- 
ing to the values y = 0 or the cooperative mood of play corresponding to y = 1. 


A solution of these DP equations can be found by solving an associated fixed point 
problem, as indicated in [Haurie & Tolwinski, 1990]. To summarize the approach we 
introduce the operator 


(TsW)(v,u) = [a—d—p(ut d(v))] ut pla- yo 
FEMEA ES Bis) 
+6 [O WO- dT (6.11) 





where F'(-) and f (-) are the cumulative distribution function and the density probability 
function respectively of the random disturbance £. We have also used the following 
notation 

s =v + plu— o(v)). 


An equilibrium solution is a pair of functions (w(-), (-)) such that 
W(v) = max Ty(W)(v, u) (6.12) 
W(v) = (TW), ġ()). (6.13) 


In [Haurie & Tolwinski, 1990] it is shown how an adapatation of the Howard policy 
improvement algorithm [Howard, 1960] permits the computation of the solution of 
this sort of fixed-point problem. The case treated in [Haurie & Tolwinski, 1990] cor- 
responds to the use of a quadratic function 0(-) and a Gaussian distribution law for 
£. The numerical experiments reported in [Haurie & Tolwinski, 1990] show that one 
can define, using this approach, a subgame perfect equilibrium which dominates the 
repeated Cournot solution. 


In [Porter, 1983], this problem has studied in the case where the (inverse) demand 
law is subject to a multiplicative noise. Then one obtains an existence proof for a 
dominating equilibrium based on a simple one-step memory scheme where the variable 
v satisfies the following equation 


x’ — x(t +1) 
u(t) 
This is the case where one does not monitor the cooperative policy through the use of a 


cumulated discrepancy function but rather on the basis of repeated identical tests. Also 
in Porter’s approach the punishment period is finite. 


v(t+1)= 
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In [Haurie & Tolwinski, 1990] it is shown that the approach could be extended to 
a full fledged Markov game, i.e. a sequential game rather than a repeated game. A 
simple model of Fisheries management was used in that work to illustrate this type of 
sequential game cooperative equilibrium. 


6.3.3 Interpretation as a communication device 


In our approach, by extending the state space description (i.e. introducing the new vari- 
ables v and y), we retained a Markov game formalism for an extended game and this 
has permitted us to use dynamic programming for the characterization of subgame per- 
fect equilibria. This is of course reminiscent of the concept of communication device 
considered in [Forges 1986]for repeated games and discussed in Part 1. An easy ex- 
tension of the approach described above would lead to random transitions between the 
two moods of play, with transition probabilities depending on the monitoring statistic 
v. Also a punishment of random duration is possible in this model. In the next section 
we illustrate these features when we propose a differential game model with random 
moods of play. 


The monitoring scheme is a communication device which receives as input the 
observation of the state of the system and sends as an output a public signal which is 
suggesting to play according to two different moods of play. 
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Part Ill 


Differential games 


Chapter 7 


Controlled dynamical systems 


7.1 A capital accumulation process 


A firm is producing a good with some equipment that we call its physical capital, or 
more simply its capital. At time t we denote by x(t) € R* the amount of capital 
acccumulated by the firm. In the parlance of dynamical systems x(t) is the state vari- 
able. A capital accumulation path over a time interval [¢°, t/] is therefore defined as 
the graph of the function x(-) : [t?, t] — x(t) € R*. In the parlance of dynamical 
systems this will be also called a state trajectory. 


The firm accumulates capital through the investment process. Denote by u(t) € Rt 
the investment realised at time t. This corresponds to the control variable for the 
dynamical systems. We also assume that the capital depreciates (wears out) at a fixed 
rate of u. 

Let us denote (t) the derivative over time dati) The evolution over time of the 
accumulated capital is then described by a differential equation, with initial data x°, 
called the state equation of the dynamical system 





z(t) = u(t) — pat) (7.1) 
cA = x. (7.2) 


Now, once the firm has chosen an investment schedule u(-) : [t°, tf] — u(t) € Rt, and 
the initial capital stock z? € R* being given, there exists a unique capital accumulation 
path that is determined as the unique solution of the differential equation (7.1) with 
initial data (7.2). It is easy to check that since the investment rates are never negative, 
the capital stock will remain non-negative (€ R*). 
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7.2 State equations for controlled dynamical systems 


In a more general setting a controlled dynamical system is described by the the state 
equation 


z(t) = f(x(t), u(t), t) (7.3) 
Oe). Sa (7.4) 
u(t) € UCR” (7.5) 


where x € R” is the state variable, u € R™ is the control variable constrained to 
remain in the set U C R”, f(-,-,-): R” x R” x R — R” is the velocity function and 
x? € R” is the initial state. Under some regularity conditions that we shall describe 
shortly, if the controller chooses a control function u(-) : [t°, tf] > u(t) € U, it results 
a unique solution x(-) : [¢°,t/] — u(t) € R” to the differential equation (7.3), with 
initial condition (7.4). This solution x(-) is called the state trajectory generated by the 
control u(-) from the initial state x°. 


Indeed one must impose some regularity conditions for assuring that the funda- 
mental theorem of existence and uniqueness of the solution to a system of differential 
equations can be invoked. We give below the simplest set of conditions. There are 


more general ones that will be needed later on, but for the moment we shall content 
ourselves with the following 


7.2.1 Regularity conditions 


1. The velovity function f(x, u,t) is C’ in x and continuous in u and t. 


2. The control function u(-) is piecewise continuous. 


7.2.2 The case of stationary systems 


If the velocity function does not depend explicitly on t we say that the system is sta- 
tionary. The capital accumulation process of section 7.1 is an example of a stationary 
system. 
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7.2.3 The case of linear systems 


A linear stationary system is described by a linear state equation 
z(t) = Azx(t) + Bu(t) (7.6) 
r(t) = 2, (7.7) 


where A: n x nand B: n x mare given matrices. 


For linear systems we can give the explicit form of the solution to Eqs. (7.6,7.7). 


a(t) = O(t,0°)2° + P(t, t?) I (s, t°) Bu(s)ds, (7.8) 

where ®(t, t?) is the n x n transfer matrix that verifies 
Talet) = A®(t,t®) (7.9) 
P(t, t) = h. (7.10) 


Indeed, we can easily check in Eq. (7.8) that x(t?) = x°. Furthermore, if one differen- 
tiates Eq. (7.8) over time one obtains 


a(t) = Ad(t,t)x9 + AP(t t) f 2(s.09)Buls)ds 


t 
+6(t,t°) P(t, t°) Bu(t) (7.11) 
Az(t) + Bu(t). (7.12) 
Therefore the function defined by (7.8) satisfies the differential equation (7.6) with 
initial condition (7.7). 


7.3 Feedback control and the stability issue 


An interesting way to implement a control on a dynamical system described by the gen- 
eral state equation (7.3,7.4) is to define the control variable u(t) as a function o (t, x(t)) 
of the current time t and state x(t). We then say that the control is defined by the feed- 
back law o(-,-) : R x R” — R”, or more directly that one uses the feedback control 
a. The associated trajectory will then be the solution to the differential equation 


lt) = Folt, a(t) (7.13) 
where we use the notation 


F(t, a(t)) = F(2(t), o(t, x(t)), t). 


We see that, if F, (-, -) is regular enough, the feedback control o will determine uniquely 


an associated trajectory emanating from x°. 
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7.3.1 Feedback control of stationary linear systems 


Consider the linear system defined by the state equation (7.6,7.7) and suppose that one 


uses a linear feedback law 
u(t) = K(t) x(t), (7.15) 


where the matrix A(t) : m x n is called the feedback gain at time t. The controlled 
system is now described by the dufferential equation 


ilt) = (A+ BK(t)) x(t) (7.16) 
a(t?) = æ. (7.17) 


7.3.2 stabilizing a linear system with a feedback control 


When a stationary system is observed on an infinite time horizon it may be natural 
to use a constant feedback gain K. Stability is an important qualitative property of 
dynamical systems, observed over an infinite time horizon, that is when t/ — oo. 


Definition 7.3.1 The dynamical system (7.16,7.17) is 


asymptotically stable if 





lim ||x(t)|] = 0; (7.18) 


t—0o 


globally asymptotically stable if (7.18) holds true for any initial condition x°. 





7.4 Optimal control problems 


7.5 A model of optimal capital accumulation 


Consider the system described in section 7.1, concerning a firm which accumulates a 
productive capital. The state equation describing this controlled accumulation process 
is (7.1,7.2 


HE = u(t) - ualt) 
z(t?) = z’, 


where x(t) and u(t) are the capital stock and the investment rate at time t, respectively. 
Assume that the output y(t), produced by the firm at time t is given by a production 
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function y(t) = ¢(a(t)). Assume now that this output, considered as an homogenous 
good, can be either consumed, that is distributed as a profit to the owner of the firm, at 
arate c(t), or invested, at a rate u(t). Thus one must have at any instant t 


c(t) + u(t) = y(t) = (z). (7.19) 


Notice that we permit negative values for c(t). This would correspond to an operation 
of borrowing to invest more. The owner wants to maximise the total profit accumulated 
over the time horizon [t°, t] which is given by the integral payoff 
tf 
ah) = f, dt 
if 


= f, OE) - ula. (7.20) 


0 


We have used the notation J(x°; x(-), u(-)) to emphasize the fact that the payoff de- 
pends on the initial capital stock and the investment schedule, which will determine 
the capital accumulation path, that is the trajectory x(-). The search for the optimal 
investment schedule can now be formulated as follows 

tf 


Ra f BOO) — u(t) (7.21) 
s.t. 

tlt) = u(t)—pa(t) (7.22) 

x(t?) x (7.23) 
u(t) > 0 (7.24) 
t © He,t. 


Notice that, in this formulation we consider that the optimization is performed over 
the set of admissible control functions and state trajectories that must satisfy the con- 
straints given by the state equation, the initial state and the non-negativity of the in- 
vestment rates, at any time t € [¢°,t/]. The optimisation problem (7.21-7.24) is an 
instance of the generic optimal control problem that we describe now. 


7.6 The optimal control paradigm 


We consider the dynamical system introduced in section 7.2 with a criterion function 
tf 


T(t, 29: #(-), u(-)) = f L(2(t), u(t), t) dt (1.25) 


0 


that is to be maximised through the appropriate choice of an admissible control func- 
tion. This problem can be formulated as follows 
tf 


max, i _ L(t), u(t), t) dt (7.26) 
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s.t. 
tt) = ft), ut), t) (7.27) 
Bie = x (7.28) 
u(t) € UCR” (7.29) 
t e [t°,t]. 


7.7 The Euler equations and the Maximum principle 


We assume here that U = R” and that the functions f(x, u,t) and L(x, u, t) are both 
C! in x and u. Under these more stringrnt conditions we can derive necessary con- 
ditions for an optimal control by using some simple arguments from the calculus of 
variations. These consitions are also called Euler equations as they are equivalent to 
the celebrated conditions of Euler for the classical variational problems. 


Definition 7.7.1 The functions ôx(-) and 6u(-) are called variations locally compati- 
ble with the constraints if they satisfy 


dat) = fe(a(t), ult), t) da(t) + fulw(t), u(t), t) Sult) (7.30) 
or) = 0. (7.31) 


Lemma 7.7.1 If u*(-) and x*(-) are the optimal control and the associated optimal 
trajectories, then the differential at (x*(-),u*(-)) of the performance criterion must be 
equal to 0 for all variations that are locally compatible with the constraints. 


The differential at (a*(-), u*(-)) of the integral performance criterion is given by 

tf 

[Lela* (t) u” (t), t) a(t) + Lula* (t), u* (t), t) du(t)] dt. 
(7.32) 

In order to express that we only consider variations dx(-) and 6u(-) that are compatible 

with the constraints, we say that du(-) is chosen arbitrarily and dx(-) is computed 

according to the variational equations” (7.30,7.31). 


51(a%2*(-),u"() = f 


0 


Now we use a “trick”. We introduce an auxiliary function \(-) : [¢°,t/] —> R”, 
where X(t) is of the same dimension as the state variable x(t) and will be called the 
co-state variable. If Eq. 7.30 is verified we may thus rewrite the expression (7.32) of 
dJ as follows 

if 


at | [L (a* (©), u(t), t) a(t) + Lula” (©), u*(t), t) du(t) + 


0 
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(t)'{—dax(t) + fo(a*(t), u*(t), t) dx(t) + 
fu(z*(t), u*(t), t) Sult) )} dt. (7.33) 


We integrate by parts the expression in A(t) dx(t) and obtain 


J E de(t) dt = FY öz(th) — AY öz() — a A(t) x(t) dt. 


10 t 


We can thus rewrite Eq. (7.33) as follows (we don’t write anymore the arguments 


(2*(¢), u (t), t)) 


J = —AX(#)’ da(tt) + AMY x(t?) 
[tle + XY fede) + AY Set 
+ {Ly + Mt) f,dx(t)} du(t)] at. (7.34) 
Let us introduce the Hamiltonian function 
H(A, 2,u,t) = L(2,u,t) + X f(a, u, t). (7.35) 
Then we can rewrite Eq. (7.34) as follows 
J = —X(#Y dx(tt) + AMY 6x(t°) 
[0.2.0.0 + MOY r0) 
+H (A(t), v*(t), u*(t), t) du(t)] dt. (7.36) 


As we can choose the function X(-) as we like, we may impose that it satisfy the 
following adjoint variational equations 


M(t)’ = H(A), 2*(t), u” (t), t) (7.37) 
ME) = 0. (7.38) 


With this particular choice of A(-) the differential 6J writes simply as 
6J = AY dx(t°) + H (Alt), x*(t), u* (t), t) du(t)] dt. (7.39) 


Since z(t?) = 0 and du(t) is arbitrary, 6.J = 0 for all variations locally compatible 
with the constraints only if 


H,,(A(é), *(t), u* (t), t) = 0. (7.40) 


We can summarise Eqs. (7.39-7.43 in the following theorem 
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Theorem 7.7.1 If u*(-) and x*(-) are the optimal control and the associated optimal 
trajectory, then there exists a function \(-) : [t?, tf] — R” which satisfies the adjoint 
variational equations 


Mt! = —H,(A(t),2*(t),u*(t),t) tefte, t (7.41) 
AGES = 0, (7.42) 
and such that 
H(A), Eut), t) = 0, tet? t], (7.43) 
where the hamiltonian function H is defined by 
H(A, x,u, t) = Laut) +A f(x,u,t), tet, t]. (1.44) 


These necessary optimality conditions, called the Euler equations, have been general- 
ized by Pontryagin [?] for dynamical systems where the velocity 


f(x,u,t) 


and the payoff rate 
L(x, u,t) 


are C! un x, continuous in u and t, and where the control constraint set U is compact. 
The following theorem is the famous ”’Maximum principle” of the theory of optimal 
control. 


Theorem 7.7.2 If u*(-) and x*(-) are the optimal control and the associated optimal 
trajectory, then there exists a function A(-) : |t?, tf] — R” which satisfies the adjoint- 
variational equations 


Mt! = -HAH rH, ut), t) tefte, t (7.45) 
AG) = 0, (7.46) 

and such that 
A(X(t), z* (t) u” (t), t) = max H (A(t), £" (t), u, t), t e [te, t”, (7.47) 


where the hamiltonian function H is defined as in Eq. (7.44). 


7.8 An economic interpretation of the Maximum Prin- 
ciple 


It could be shown that the co-state variable A(t) indicates the sensitivity of the perfor- 
mance criterion to a marginal variation of the state x(t). It can therefore be interpreted 
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as a marginal value of the state variable. Along the optimal trajectory, the Hamiltonian 
is therefore composed of two ”values”; the current payoff rate L(x* (t), u*(t),t) and 
the value of the current state modification A(t)’ f (x* (t), u*(t),t). This is this global 
value that is maximised w.r.t. u at any point of the optimal trajectory. 


The adjoint variational equations express the evolution of the marginal value of the 
state variables along the trajectory. 


7.9 Synthesis of the optimal control 


7.10 Dynamic programming and the optimal feedback 
control 


Theorem 7.10.1 Assume that there exist a function V*(-,-) : R x R” — Randa 
feedback law o* (t, x) that satisfy the following functional equations 


ð * = ð * * 
= (a= max H (5V (t,x), c“ (t) u, t), Vt, Yz (7.48) 
= aÈ (t,x), c° (t), ož (t,x), t), Vt, Vz (7.49) 
aS 0: (7.50) 


Then V*(t', x*) is the optimal value of the performance criterion, for the optimal con- 
trol problem defined with initial data x(t) = x’, tê € |t°, tf]. Furthermore the solution 
of the maximisation in the R.H.S. of Eq. (7.48) defines the optimal feedback control. 


Proof: 


Remark 7.10.1 Notice here that the partial derivatives Zv* (t,x) appearing in the 
Hamiltonian in the R.H.S. of Eq. (7.48) is consistent with the interpretation of the co- 
state variable A(t) as the sensitivity of the optimal payoff to marginal state variations. 
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7.11 Competitive dynamical systems 


7.12 Competition through capital accumulation 


7.13 Open-loop differential games 


E, a) aC), tml) = 


tf 


f L, (a(t), u(t), ..- , Um(t), t) dt (1.51) 
s.t. 
tlt) = flælt), ult), .-.,umlt), t) (1.52) 
x(t?) z? (7.53) 
u(t) E€ U CR™ (7.54) 
t e W, t]. 


7.13.1 Open-loop information structure 


Each player knows the initial data t°, x° and all the other elements of the dynamical 
system; Player j selects a control function u;(-) : [t°, tf] + U;; the game is played as 
a single simultaneous move. 


7.13.2 An equilibrium principle 


Theorem 7.13.1 Jf u*(-) and x*(-) are the Nash equilibrium controls and the asso- 
ciated optimal trajectory, then there exists m functions j(-) : [t®, tf] — R” which 
satisfy the adjointvariational equations 
A(t’ = H; (Alt) r(t), ut) te e,t] (7.55) 
At) = 0, (7.56) 
and such that 
H(A; (t), 2*(t), u*, t) = max H(A(t), 2*(t), u, u], t), t © [t°,t/], (7.57) 


uz EU; 
where the hamiltonian function H; is defined as follows 


A,(Aj,2,u,t) = L(a,u,t) + rj f(a, u,t). (7.58) 
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7.14 Feedback differential games 


7.14.1 Feedback information structure 


Player j can observe time and state (t, x(t)); he defines his control as the result of a 
feedback law o;(t, x); the game is played as a single simultaneous move where each 
player announces the strategy he will use. 


7.14.2 A verification theorem 


Theorem 7.14.1 Assume that there exist m functions V;*(-,-) : R x R” — Randm 
feedback strategies o*~/(t, x) that satisfy the following functional equations 


o ð ; 
——V* = (V * ae eye 
Bere) = mM eee a 
te [t,t], s ER” (7.59) 
VEGA = 0. (7.60) 


Then Vř (t’, x) is the equilibrium value of the performance criterion of Player j, for 
the feedback Nash differential game defined with initial data x(t’) = x’, t € [t°, t/]. 
Furthermore the solution of the maximisation in the R.H.S. of Eq. (7.59) defines the 
equilibrium feedback control of Player j. 


7.15 Why are feedback Nash equilibria outcomes dif- 
ferent from Open-loop Nash outcomes? 


7.16 The subgame perfectness issue 


7.17 Memory differential games 


7.18 Characterizing all the possible equilibria 
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Part IV 


A Differential Game Model 
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in this last part of our presentation we propose a stochastic differential games model 
of economic competition through technological innovation (R&D competition) and we 
show how it is connected to the theory of Markov and/or sequential games discussed 
in Part 2. This example deals with a stochastic game where the random disturbances 
are modelled as a controlled jump process. 


In control theory, an interesting class of stochastic systems has been studied, where 
the random perturbations appear as controlled jump processes. In [?], [?] the relation- 
ship between these control models and discrete event dynamic programming models is 
developed. In [?] a first adaptation of this approach to the case of dynamic stochastic 
games has been proposed. Below we illustrate this class of games through a model of 
competition through investment in R&D. 


7.19 A Game of R&D Investment 


7.19.1 Dynamics of R&D competition 


We propose here a model of competition through R&D which extends earlier formu- 
lations which can be found in [?]. Consider m firms competing on a market with 
differentiated products. Each firm can invest in R&D for the purpose of making a 
technological advance which could provide it with a competitive advantage. The com- 
petitive advantage and product differentiation concepts will be discussed later. We fo- 
cuss here on the description of the R&D dynamics. Let x; be the level of accumulation 
of R&D capital by the firm j. The state equation describing the capital accumulation 
process is 
tilt) = uj(t) — uzit) 
(7.61) 


a;(0) = 2}, 


where the control variable u; € U; gives the investment rate in R&D and pu; is the 
depreciation rate of R&D capital. 


We represent firm’s j policy of R&D investment as a function 
u;(-) : [0, 00) — U}. 


Denote U; the class of admissible functions u,(-). With an initial condition a;(0) = x? 
and a piecewise continuous control function u;(-) € U; is associated a unique evolution 


x;(-) of the de capital stock, solution of equation (7.61). 


The R&D capital brings innovation through a discrete event stochastic process. Let 
T be a random stopping time which defines the date at which the advance takes place. 
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Consider first the case with a single firm j. The elementary conditional probability is 
given by 


where liMma—o ad) = 0 uniformly in xj. The function w;(x;) represents the controlled 
intensity of the jump process which describes the innovation. The probability of having 


a technological advance occuring between times t and t + dt is given by 


Puolt<T<t+dt] = w;(2,(t))e7 Jo eses 44 + o(dt) (7.63) 


Therefore the probability of having the occurence before time 7 is given by 


Peoia = aa e fo sae, (1.64) 
Equations (7.62-7.64) define the dynamics of innovation occurences for firm 7. 


Since there are m firms the combined jump rate will be 
w(x(t)) = X wlz). 
j=1 


Given that a jump occurs at time 7, the conditionnal probability that the innovation 
come from firm j is given by 

wj(x;(T)) 

w(x(r)) 


The impact of innovation on the competitive hedge of the firm is now discussed. 


7.19.2 Product Differentiation 


We model the market with differentiated products as in [?]. We assume that each 
firm j markets a product characterized by a quality index q; € R and a price p; € R. 
Consider first a static situation where N consumers are buying this type of goods. It is 
assumed that the (indirect) utility of a consumer buying variant 7 is 


Vi = y — pj + 0qj + £j, (7.65) 


where 0 is the valuation of quality and the £; are i.i.d., with standard deviation p, 
according to the double exponential distribution (see [?] for details). Then the demand 
for variant j is given by - 

D; = NP; (7.66) 

where 

p — _ SPa = p)/] 

J m ’ 
die expl(Ogi — pi) / tl 





j=l,...,m. (7.67) 
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This corresponds to a multinomial logit demand model. 


In a dynamic framework we assume a fixed number NV of consumers with a demand 
rate per unit of time given by (7.65-7.67). But now both the quality index q;(-) and the 
price p,(-) are function of t. Actually they will be the instruments used by the firms to 
compete on this market. 


7.19.3 Economics of innovation 


The economics of innovation is based on the costs and the advantages associated with 
the R&D activity. 


R&D operations and investment costs. Let L;(x;,u;) be a function which defines 
the cost rate of R&D operations and investment for firm j. 


Quality improvement. When a firm j makes a technological advance, the quality 
index of the variant increases by some amount. We use a random variable Aq;(7)) 
to represent this quality improvement. The probability distribution of this random 
variable, specified by the cumulative probability function F}(-, xj), is supposed to also 
depend on the current amount of know-how, indicated by the capital «;(7). The higher 
this stock of capital is, the higher will be the probability given to important quality 
gains. 


Adjustment cost. Let $;(x;) be a monotonous et decreasing function of x; which 
represents the adjustment cost for taking advantage of the advance. The higher the 
capital x; at the time of the innovation the lower will be the adjustment cost. 


Production costs Assume that the marginal production cost c; for firm 7 is constant 
and does not vary at the time of an innovation. 


Remark 7.19.1 We implicitly assume that the firm which benefits from a technological 
innovation will use that advantage. We could extend the formalism to the case where a 
firm stores its technological advance and delays its implementation. 
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7.20 Information structure 


We assume that the firms observe the state of the game at each occurence of the discrete 
event which represents a technology advance. This system is an instance where two 
dynamics, a fast one and a slow one are combined, as shown in the definition of the 
state variables. 


7.20.1 State variables 


Fast dynamics. The levels of capital accumulation x;(t), 7 = 1,...,m define the 
fast moving state of this system. These levels change according to the differential state 
equations (7.61). 


Slow dynamics. The quality levels q;(t), 7 = 1,...,m define a slow moving state. 
These levels change according to the random jump process defined by the jump rates 
(7.61) and the distribution functions F;;(-,«;). If firm j makes a technological advance 
at time 7 then the quality index changes abruptly 


q(T) = gT) + Aq;(T). 


7.20.2 Piecewise open-loop game. 


Assume that at each random time 7 of technological innovation all firms observe the 
state of the game. i.e the vectors 


s(T) = (x(7), a(7)) = (x;(T), 49(7))j=1,...m- 


Then each firm j selects a price schedule p;(-) : [r, c0) — R and an R&D investment 
schedule u;(-) : [r, o0) + R which are defined as open-loop controls. These controls 
will be used until the next discrete event (technological advance) occurs. We are thus 
in the context of piecewise deterministic games as introduced in [?]. 


7.20.3 A Sequential Game Reformulation 


In [?] it is shown how piecewise deterministic games can be studied as particular in- 
stances of sequential games with Borel state and action spaces. The fundamental ele- 
ment of a sequential game is the so-called local reward functionals which permit the 
definition of the dynamic programming operators. 
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Let v;(s) be the expected reward-to-go for player j when the system is in initial 
state s. The local reward functional describes the total expected payoff for firm 7 when 
all firms play according to the open-loop controls p;(-) : [7,00) —> R and u,(-) : 
[7,0co) + R until the next technological event occurs and the subsequent rewards are 
defined by the v;(s) function. We can write 


heya) = Eso |S DOO- c) 
—L;(ax;(t), u,(t))} dt + e™v;(s(T))], (7.68) 


TS dee Ms 


where p is the discount rate, 7 is the random time of the next technological event and 
s(7) is the new random state reached right after the occurence of the next technological 
event, i.e. at time 7. From the expression (7.68) we see immediately that: 


1. We are in a sequential game formalism, with continuous state space and func- 
tional sets as action spaces (oo-horizon controls). 


2. The price schedule p;(t), t > 7 can be taken as a constant (until the next techno- 
logical event) and is actually given by the solution of the static oligopoly game 
with differentiated products of qualities p;(t), j = 1,...,m. The solution of this 
oligopoly game is discussed in [?], where it is shown to be uniquely defined. 


3. The R&D investment schedule is obtained as a trade-off between the random 
(due to the random stopping time 7) transition cost given by fj e~"L, (x(t), u;(t)) dt 
and the expected reward-to-go after the next technological event. 


We shall limit our discussion to this definition of the model and its reformulation 
as a sequential game in Borel spaces. The sequential game format does not lend itself 
easily to qualitative analysis via analytical solutions. A numerical analysis seems more 
appropriate to explore the consequences of these economic assumptions. The reader 
will find in [?] more information about the numerical approximation of solutions for 
this type of games. 
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